Re: An isolation test for SERIALIZABLE READ ONLY DEFERRABLE

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An isolation test for SERIALIZABLE READ ONLY DEFERRABLE
Date: 2017-01-16 00:40:21
Message-ID: CAEepm=1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN=0LY3pJ49Ksg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 12, 2017 at 2:21 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Jan 12, 2017 at 4:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jan 10, 2017 at 11:41 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> Do you think that expanding the wait query by default could be
>>> intrusive for the other tests? I am wondering about such a white list
>>> to generate false positives for the existing tests, including
>>> out-of-core extensions, as all the tests now rely only on
>>> pg_blocking_pids().
>>
>> It won't affect anything unless running at transaction isolation level
>> serializable with the "read only deferrable" option.
>
> Indeed as monitoring.sgml says. And that's now very likely close to
> zero. It would be nice to add a comment in the patch to just mention
> that. In short, I withdraw my concerns about this patch, the addition
> of a test for repeatable read outweights the tweaks done in the
> isolation tester. I am marking this as ready for committer, I have not
> spotted issues with it.

Thanks! Aside from exercising the code, which is surely always a good
thing, I think these three tests are quite educational on their own
for those of us trying to learn about transaction isolation.

I also have longer term plans to show the first and third of them
running with the read-only transaction moved to a standby server.
Kevin Grittner gave me the idea of multi-server isolation tests when I
mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list,
which prompted me to realise that our existing SSI tests aren't very
interesting for that because they all rely on the behaviour of
SERIALIZABLE, not SERIALIZABLE DEFERRABLE. So I figured we'd better
have some tests that show show that working on a single node system
first.

Concerning the question of beauty, I thought of several ways for the
isolation tester to detect this type of waiting:

1. Create a new function pg_waiting_for_safe_snapshot(pid) that would
acquire SerializableXactHashLock, do a linear search of active
SERIALIZABLEXACT structs (using FirstPredXact and NextPredXact)
looking for sact->pid == pid and then test sact->flags &
SXACT_FLAG_DEFERRABLE_WAITING. Then change the isolation tester's
"waiting" query to add " OR pg_waiting_for_safe_snapshot($1)".

2. Modify the existing pg_blocking_pids() function to detect this
type of waiting and figure out which other pids are responsible,
adding them to its current results. That could work by first doing a
linear search to find the SERIALIZABLEXACT struct as above, and if its
SXACT_FLAG_DEFERRABLE_WAITING flag is set, then walking the list of
possibleUnsafeConflicts to find the pids responsible. Then the
isolation tester wouldn't need to change at all.

3. Create a new function pg_waiting_for_safe_snapshot_pids(),
separate from pg_blocking_pids(), to return the list of pids as above,
and modify the isolation tester to call this new function too.

4. The wait_event based approach as proposed, looking at pg_stat_activity.

I couldn't think of any practical uses for the new facilities 1-3
outside this isolation test, which is why I didn't look into those
more intrusive approaches. I admit that 4 is a bit clunky, but it's a
simple non-intrusive change and I can't think of any reason it would
give incorrect results. Even though wait_event is updated and read
without memory synchronisation, as far as I know we can assume that
select(2) and WaitLatch contain full memory barriers so the isolation
tester will certainly see the SafeSnapshot value when it repeatedly
polls that query, and the value can't change again until the isolation
tester sees it, recognises it as a kind of waiting it knows about, and
moves on to running the step in the test.

Upon reflection, either 2 or 3 might be considered more beautiful than
4, depending on how others feel about extending the remit of the
existing pg_blocking_pid function. I'd be happy to post a new patch
using one of those approaches if others feel it'd be better that way.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-01-16 02:38:26 Re: postgres_fdw bug in 9.6
Previous Message Petr Jelinek 2017-01-16 00:19:59 Re: Logical Replication WIP