Re: BUG #17619: AllocSizeIsValid violation in parallel hash join

From: Dmitry Astapov <dastapov(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
Date: 2022-09-22 16:38:00
Message-ID: CAFQUnFhEFTaMafweOG2P7FNToLeiDFrsYRfSBk7TrrkTv+3fMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

And I think that if on line 322 should be ">" and not ">=" as well.

On Thu, 22 Sep 2022, 17:11 Dmitry Astapov, <dastapov(at)gmail(dot)com> wrote:

> Thomas, I think I know what is going on, and it is happening in
> sts_puttuple when it needs to write 32760 bytes exacly. All references
> below a taken in the tag REL_13_8.
>
> Lets say that we enter sts_puttuple and on line 321 we compute a size
> <https://github.com/postgres/postgres/blob/4bc493d14409857090928ea51c02a20aba8db364/src/backend/utils/sort/sharedtuplestore.c#L321>
> of 32760 bytes. Lets say that buffer is currently partially filled so a
> write of this size would not fit.
>
> We proceed to flush chunk on line 338, which resets the write pointer to 8
> bytes past the start of SharedTuplestoreChunk, leaving exactly 32760 bytes
> free.
>
> We take the if on line 342
> <https://github.com/postgres/postgres/blob/4bc493d14409857090928ea51c02a20aba8db364/src/backend/utils/sort/sharedtuplestore.c#L342> (the
> one with the comment "It still might not be enough space", and then proceed
> to write whole 32760 bytes and reach the line 373 with size == 0.
>
> It looks like this codepath is written with assumption that size >0 here
> and sts_flush_chunk would be called immediately after inside the loop on
> line 375. But because size == 0, loop will not execute, sts_flush_chunk
> will not be called, and since we haven't updated write pointer so far, it
> will stay at the beginning of the chunk. At this point we exit
> sts_puttuple, and next call of sts_putttuple would overwrite the tuple we
> just memcopied into the chank. If this new tuple is shorter, then remaining
> part of the chunk would be garbage from the overwritten tuple.
>
> I think >= on line 342 should be just > . I tried this change locally, and
> it fixed the issue for me.
>
> Do you agree with my analysis?
>
> On Thu, Sep 22, 2022 at 1:51 PM Dmitry Astapov <dastapov(at)gmail(dot)com> wrote:
>
>>
>>
>> On Thu, Sep 22, 2022 at 9:44 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
>> wrote:
>>
>>> On Thu, Sep 22, 2022 at 7:46 PM PG Bug reporting form
>>> <noreply(at)postgresql(dot)org> wrote:
>>> > (gdb) p size
>>> > $2 = 1702125924
>>>
>>> Thanks for the detailed report. Hmm. That size, on a little-endian
>>> system, is equivalent to the byte sequence "date\0\0\0\0", which looks
>>> pretty suspiciously like the inside of a tuple, and not its size. We
>>> must have got out of sync somehow.
>>>
>>
>> Great catch! I added some debugging output to the BufFileRead and
>> sts_read_tuple and it looks to me that the sequence of events immediately
>> before the problem happens is like this:
>>
>>
>> 1. sts_read_tuple is called multiple times to read from a particular
>> file, and a bunch of tuples are read this way
>> 2. sts_read_tuple is called once more. In my experiments, I always
>> had read_ntuples_available - 2 == read_ntuples at this point, but maybe
>> this is just a coincidence.
>> 1. inside this sts_read_tuple_call, BufFileRead is called to read
>> metadata, and succeeds (and this chunk of the file looks -- to my naive eye
>> -- exactly like any other tuple start)
>> 2. Second BufFileRead call happens to get the size, and reads the
>> correct size (I see what's in the file, I could find corresponding values
>> in the database tables, and it all lines up).
>> 3. Third BufFileRead happens to read the actual tuple. So far so
>> good, and sts_read_tuple finishes.
>> 3. At this point, the position in the file (buf_file -> currOffset +
>> post) points right after the last tuple read, but what's there does not
>> look like the start of the next tuple to me. It seems like there was
>> another, longer tuple written there before (with quite long rhs.payload
>> value) and somehow the tuple that was just read was written over it, so we
>> have "<tuple read on step 2 above><current read position here><what looks
>> like leftovers from another tuple><a bunch of zeroes><new tuple start>"
>> 4. So another sts_read_tuple call happens, and it reads 4 bytes of
>> this garbage as metadata, and the next 4 bytes as length, and this is when
>> everything derails...
>>
>> I also double-checked at the logs of the (test) database where this issue
>> was first spotted (and where a lot of queries similar to the one I am using
>> to trigger this issue are happening) and found out that along with "invalid
>> memory alloc request size" we also got a couple of occasions of "Unexpected
>> chunk in shared tuplestore temporary file. Expected overflow chunk", which
>> looks like another error that could be potentially explained by
>> BufFileRead's inside sts_read_tuple reading "garbage".
>>
>> I'm happy to build a patched version of 13.8 (or another revision if you
>> want me to) to collect more debug information for you if this could help.
>>
>> --
>> D. Astapov
>>
>
>
> --
> D. Astapov
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2022-09-22 17:31:02 Re: Regression in pipeline mode in libpq 14.5
Previous Message PG Bug reporting form 2022-09-22 16:34:28 BUG #17621: PostgreSQL 15 Windows installer still downloads Beta 3