Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
Date: 2018-09-06 11:44:00
Message-ID: CAN-RpxD07n2ktWL=B-RvYfWG2Q5+fXUeqJkNWf5p0yG2m1p9Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 6, 2018 at 1:31 PM Chris Travers <chris(dot)travers(at)adjust(dot)com>
wrote:

>
>
> On Thu, Sep 6, 2018 at 11:08 AM Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
>
>> Ok, so here's my current patch (work in progress). I have not run tests
>> yet, and finding a way to add a test case is virtually impossible though I
>> expect we will find ways of using gdb to confirm local fix later.
>>
>> After careful code review, I settled on the following approach which
>> seemed to be the least intrusive.
>>
>> 1. Change the retry logic so that it does not retry of
>> QueryCancelPending or ProcDiePending are set. In these cases EINTR is
>> passed back to the caller
>> 2. Change the error handling logic of the caller so that EINTR is
>> handled by the next CHECK_FOR_INTERRUPTS() after cleanup.
>>
>> This means that the file descriptor is now already closed and that we
>> handle this the same way we would with a ENOSPC. The reason for this is
>> that there are many places in the code where it is not clear what degree of
>> safety is present for throwing errors using ereport, and the patch needs to
>> be as unobtrusive as possible to facilitate back porting.
>>
>> At this point the patch is for discussion. I have not even compiled it
>> or tested it but maybe there is something wrong with my approach so figured
>> I would send it out early.
>>
>> The major assumptions are:
>> 1. close() will never take longer than the interrupt interval that
>> caused the loop to break.
>> 2. close() does not get interrupted in a way that will not cause cleanup
>> problems later.
>> 3. We will reach the next interrupt check at a reasonable interval and
>> safe spot.
>>
>> Any initial arguments against?
>>
>
> The previous patch had two issues found on internal code review here. I
> am sending a new patch.
>
> This patch compiles. make check passes
> make check-world fails but the errors are the same as found on master and
> involve ecpg.
>
> We will be doing further internal review and verification and then will
> run through pg_indent before final submission.
>
> Changes in this patch:
> 1. Previous patch had backwards check for EINTR in calling function.
> This was fixed.
> 2. In cases where ERROR elevel was passed in, function would return
> instead of error out on case of error.
>
> So in this case we check if we expect to error on error and if so, check
> for interrupts. If not, we go through the normal error handling/logging
> path which might result in an warning that shared memory segment could not
> be allocated followed by an actual meaningful error. I could put that in
> an else if, if that is seen as a good idea, but figured I would raise it as
> a discussion point.
>

Attaching correct patch....

>
>
>>
>> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
racecondition.patch application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksandr Parfenov 2018-09-06 11:51:09 Re: Optimze usage of immutable functions as relation
Previous Message Alexander Korotkov 2018-09-06 11:40:41 Re: [HACKERS] Bug in to_timestamp().