Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition

From: tender wang <tndrwang(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date: 2023-12-28 03:40:31
Message-ID: CAHewXN=chu4kBxj=vtCOJJoOCAvipfJzKRuH26BMiyHSDhBk7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I have always been curious why an error is reported only when there is not
enough space.
I did some tests and , maybe, I found some answers. My tests as below:

----------------------------
postgres=# CREATE UNLOGGED TABLE filler(a int, b text STORAGE plain);
CREATE TABLE
postgres=# INSERT INTO filler SELECT g, repeat('x', 1000) FROM
generate_series(1,50000) g;
INSERT 0 50000
postgres=# CREATE TEMP TABLE tbl(a int);
CREATE TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
ERROR: could not extend file "base/5/t3_16389": No space left on device
HINT: Check free disk space.
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
ERROR: could not extend file "base/5/t3_16389": No space left on device
HINT: Check free disk space.
postgres=# truncate tbl ;
TRUNCATE TABLE
postgres=# drop table filler ;
DROP TABLE
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
postgres=# INSERT INTO tbl SELECT g FROM generate_series(1, 200000) g;
INSERT 0 200000
------------------------
It didn't report an error when I truncated the temp table.
I found buffer's buf_state on local hash table not cleanup when there was
no space left on the device.
If I do truncate temp table, DropRelationLocalBuffers() will be called,
the buf_state will be clear, then no assert failed issue report.

tender wang <tndrwang(at)gmail(dot)com> 于2023年12月27日周三 17:22写道:

> When I debugged the ExtendBufferedRelLocal(), I found a
> repeated assignment to existing_hdr.
> So I fixed this small issue with the previous v2 patch together with the
> attached v3 patch.
>
> tender wang <tndrwang(at)gmail(dot)com> 于2023年12月27日周三 17:08写道:
>
>>
>>
>> Alexander Lakhin <exclusion(at)gmail(dot)com> 于2023年12月27日周三 15:00写道:
>>
>>> Hello tender wang,
>>>
>>> 26.12.2023 19:55, tender wang write:
>>>
>>> I tried to analyze the issue, and I found that it might be caused by
>>> this commit:
>>> commit dad50f677c42de207168a3f08982ba23c9fc6720
>>> bufmgr: Acquire and clean victim buffer separately
>>>
>>>
>>> Thanks for looking into it!
>>>
>>> ...
>>>>
>>>> With debug logging added in this code within ExtendBufferedRelLocal():
>>>>> if (found)
>>>>> {
>>>>> BufferDesc *existing_hdr =
>>>>> GetLocalBufferDescriptor(hresult->id);
>>>>> uint32 buf_state;
>>>>>
>>>>>
>>>>> UnpinLocalBuffer(BufferDescriptorGetBuffer(victim_buf_hdr));
>>>>>
>>>>> existing_hdr = GetLocalBufferDescriptor(hresult->id);
>>>>> PinLocalBuffer(existing_hdr, false);
>>>>> buffers[i] = BufferDescriptorGetBuffer(existing_hdr);
>>>>>
>>>>> buf_state = pg_atomic_read_u32(&existing_hdr->state);
>>>>> Assert(buf_state & BM_TAG_VALID);
>>>>> Assert(!(buf_state & BM_DIRTY));
>>>>> buf_state &= BM_VALID;
>>>>> pg_atomic_unlocked_write_u32(&existing_hdr->state,
>>>>> buf_state);
>>>>> ...
>>>>> I see that it reached for the second INSERT (and NOSPC error) with
>>>>> existing_hdr->state == 0x2040000, but for the third INSERT I observe
>>>>> state == 0x0.
>>>>>
>>>>>
>>> I wonder, if "buf_state &= BM_VALID" is a typo here, maybe it supposed
>>> to be
>>> "buf_state &= ~BM_VALID" as in ExtendBufferedRelShared()...
>>>
>>
>> Yeah, that's true. I analyze this issue again, and I think the root
>> cause is the " buf_state &= BM_VALID" .
>> In my report issue, buf_state & BM_VALID is true, but buf_state &
>> BM_TAG_VALID is false. This situation is impossible.
>> It can't happen that the data in the local buffer pool is valid, but
>> LocalBufHash has no entry.
>>
>> I modified v1 patch, and attached v2 patch should fix the above issues.
>>
>> Best regards,
>>> Alexander
>>>
>>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2023-12-28 04:30:42 Re: BUG #18260: Unexpected error: "negative bitmapset member not allowed" triggered by multiple JOIN
Previous Message Richard Guo 2023-12-28 03:03:44 Re: BUG #18260: Unexpected error: "negative bitmapset member not allowed" triggered by multiple JOIN