Re: Assertion failing in master, predicate.c

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Martijn van Oosterhout <kleptog(at)gmail(dot)com>
Subject: Re: Assertion failing in master, predicate.c
Date: 2019-11-22 23:25:25
Message-ID: 26412.1574465125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
>> `git bisect` shows the problem occurs earlier than that, and by
>> chance the first bad commit was one of yours. I'm not surprised
>> that your commit was regarding LISTEN/NOTIFY, as the error is
>> always triggered with a LISTEN statement. (I've now hit this
>> many times in many tests of multiple SQL statements, and the
>> last statement before the error is always a LISTEN.)

> Oh my, that's interesting! I had wondered a bit about the LISTEN
> changes, but it's hard to see how those could have any connection
> to serializable mode. This will be an entertaining debugging
> exercise ...

It looks to me like this is an ancient bug that just happened to be
made more probable by 51004c717. That Assert in predicate.c is
basically firing because MySerializableXact got created *after*
PreCommit_CheckForSerializationFailure, which is what should have
marked it as prepared. And that will happen, if we're in serializable
mode and this is the first LISTEN of the session, because
CommitTransaction() calls PreCommit_Notify after it calls
PreCommit_CheckForSerializationFailure, and PreCommit_Notify calls
asyncQueueReadAllNotifications which wants to get a snapshot, and
the transaction had no snapshot before.

The only reason it's showing up now is that actually the logic is

if (!QUEUE_POS_EQUAL(max, head))
asyncQueueReadAllNotifications();

that is, we'll skip the problematic call if the notify queue is
visibly empty. But 51004c717 changed how aggressively we move
the queue tail forward, so that in this simple example we will
now see the queue as possibly not empty, where we would have
decided it was empty before.

Of course, the bug exists anyway, because concurrent NOTIFY traffic
could certainly cause the queue to be nonempty at this point.
I venture that the only reason we've not seen field reports of
this issue is that people don't run with asserts on in production
(and, I guess, the problem is actually harmless except for the
Assert). Or maybe people don't use serializable mode in apps
that use LISTEN/NOTIFY?

Anyway, it seems like the simplest fix is to swap the order of
the PreCommit_CheckForSerializationFailure and PreCommit_Notify
steps in CommitTransaction. There's also PrepareTransaction
to think about, but there again we could just move AtPrepare_Notify
up; it's only going to throw an error anyway, so we might as well
do that sooner.

An alternative idea is to use some other way of getting a snapshot
in asyncQueueReadAllNotifications, one that always gets a current
snapshot and doesn't enter predicate.c. But that might have semantic
consequences on the timing of notifications. I'm not really sure
that anybody's ever thought hard about how async.c ought to act
in serializable mode, so this might or might not be a good change.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-11-22 23:25:33 Re: [PATCH] Tiny optmization or a bug?
Previous Message Tomas Vondra 2019-11-22 23:12:48 Re: [BUG] (firsttupleslot)==NULL is redundant or is possible null dereference?