I have always been curious why an error is reported only when there is not enough space.
I did some tests and , maybe, I found some answers. My tests as below:
----------------------------
postgres=# CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
CREATE TABLE
postgres=# INSERT INTO filler SELECT g, repeat('x', 1000) FROM generate_series(1,50000) g;
INSERT 0 50000
postgres=# CREATE TEMP TABLE tbl(a int);
CREATE TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
ERROR: could not extend file "base/5/t3_16389": No space left on device
HINT: Check free disk space.
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
ERROR: could not extend file "base/5/t3_16389": No space left on device
HINT: Check free disk space.
postgres=# truncate tbl ;
TRUNCATE TABLE
postgres=# drop table filler ;
DROP TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
------------------------
It didn't report an error when I truncated the temp table.
I found buffer's buf_state on local hash table not cleanup when there was no space left on the device.
If I do truncate temp table, DropRelationLocalBuffers() will be called, the buf_state will be clear, then no assert failed issue report.
When I debugged the ExtendBufferedRelLocal(), I found a repeated assignment to existing_hdr.
So I fixed this small issue with the previous v2 patch together with the attached v3 patch.
Hello tender wang,
26.12.2023 19:55, tender wang write:
I tried to analyze the issue, and I found that it might be caused by this commit:
commit dad50f677c42de207168a3f08982ba23c9fc6720
bufmgr: Acquire and clean victim buffer separately
Thanks for looking into it!
...
With debug logging added in this code within ExtendBufferedRelLocal():
if (found)
{
BufferDesc *existing_hdr =
GetLocalBufferDescriptor(hresult->id);
uint32 buf_state;
UnpinLocalBuffer(BufferDescriptorGetBuffer(victim_buf_hdr));
existing_hdr = GetLocalBufferDescriptor(hresult->id);
PinLocalBuffer(existing_hdr, false);
buffers[i] = BufferDescriptorGetBuffer(existing_hdr);
buf_state = pg_atomic_read_u32(&existing_hdr->state);
Assert(buf_state & BM_TAG_VALID);
Assert(!(buf_state & BM_DIRTY));
buf_state &= BM_VALID;
pg_atomic_unlocked_write_u32(&existing_hdr->state, buf_state);
...
I see that it reached for the second INSERT (and NOSPC error) with
existing_hdr->state == 0x2040000, but for the third INSERT I observe
state == 0x0.
I wonder, if "buf_state &= BM_VALID" is a typo here, maybe it supposed to be
"buf_state &= ~BM_VALID" as in ExtendBufferedRelShared()...
Yeah, that's true. I analyze this issue again, and I think the root cause is the " buf_state &= BM_VALID" .
In my report issue, buf_state & BM_VALID is true, but buf_state & BM_TAG_VALID is false. This situation is impossible.
It can't happen that the data in the local buffer pool is valid, but LocalBufHash has no entry.
I modified v1 patch, and attached v2 patch should fix the above issues.
Best regards,
Alexander