Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?
Date: 2018-03-04 15:05:59
Message-ID: 9a0db23a-1650-8ba8-02bd-80c4d41f9e7e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/04/2018 10:27 AM, Thomas Munro wrote:
> On Sun, Mar 4, 2018 at 5:40 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Could shm_mq_detach_internal() need a pg_write_barrier() before it
>> writes mq_detached = true, to make sure that anyone who observes that
>> can also see the most recent increase of mq_bytes_written?
>
> I can reproduce both failure modes (missing tuples and "lost contact")
> in the regression database with the attached Python script on my Mac.
> It takes a few minutes and seems to be happen sooner when my machine
> is also doing other stuff (playing debugging music...).
>
> I can reproduce it at 34db06ef9a1d7f36391c64293bf1e0ce44a33915
> "shm_mq: Reduce spinlock usage." but (at least so far) not at the
> preceding commit.
>
> I can fix it with the following patch, which writes XXX out to the log
> where it would otherwise miss a final message sent just before
> detaching with sufficiently bad timing/memory ordering. This patch
> isn't my proposed fix, it's just a demonstration of what's busted.
> There could be a better way to structure things than this.
>

I can confirm this resolves the issue for me. Before the patch, I've
seen 112 failures in ~11500 runs. With the patch I saw 0 failures, but
about 100 messages XXX in the log.

So my conclusion is that your analysis is likely correct.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-04 15:19:32 Re: [HACKERS] Removing LEFT JOINs in more cases
Previous Message Tomas Vondra 2018-03-04 15:00:20 Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types