Re: Sync Rep and shutdown Re: Sync Rep v19

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Date: 2011-03-20 04:44:25
Message-ID: AANLkTimGEzdw3RMqGqqrqcEg+BTF+gnMcxXAKrrUkfCT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 19, 2011 at 10:32 AM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
> Testing 'methodology' sounds a bit heavy. I tested a number of patch
> versions over time, with 30 second, hourly and nightly pgbench runs. The
> nightly more for durability/memory leak testing than tps numbers, since I
> gradually got the impression that pg_bench tests on syncrep setups somehow
> suffer less from big differences between tests.
>
> postgres and recovery.conf I used to test v17 is listed here
> http://archives.postgresql.org/pgsql-hackers/2011-02/msg02364.php
>
> After the tests on v17 I played a bit with small memory changes in the
> postgres.conf to see if the tps would go up. It went up a little but not
> enough to mention on the lists. All tests after v17 were done with the
> postgres.conf that I've copy pasted below.

Hmm, I'm not going to be able to reproduce this here, and my test
setup didn't show a clear regression. I can try beating on it some
more, but... Any chance you could rerun your test with the latest
master-branch code, and perhaps also with the patch I proposed
upthread to remove a branch from the section protection by
SyncRepLock? I can't really tell from reading the emails you linked
what was responsible for the slowdowns and speedups, and it is unclear
to me how much impact my recent changes actually had. I would have
expected the dominant cost to be waiting for the slave to complete its
fsync, with network overhead as runner-up. And indeed this appears to
be the case on my ext4-based system. I would not have expected
contention on SyncRepLock to be much of a factor.

It strikes me that if contention on SyncRepLock IS the dominating
factor, the whole approach to queue management is pretty well busted.
*Every* walsender takes SyncRepLock in exclusive mode on receipt of
*every* standby reply message. That seems rather inefficient. To
make matters worse, every time a walsender grabs SyncRepLock, it
redoes the whole computation of who the synchronous standby is from
scratch. It strikes me that it ought to be possible to rejigger
things so that when a walsender exits, it signals any other walsenders
that exist to recheck whether they need to take over the role of
synchronous standby; then, each walsender needs to take the lock and
recheck only when it first connects, and each time it's signalled
thereafter. When a walsender does decide that a change in the
synchronous standby is needed, it should store the ID of the new
walsender in shared memory, in a variable protected by SyncRepLock, so
that the current synchronous standby can notice the change with a
simple integer comparison.

It also strikes me that we ought to be able to rejigger things so that
the backends being woken up never need to touch shared memory at all -
i.e. eliminate syncRepState - thus reducing cache line contention. If
WaitLatch() returns true, then the latch was set, presumably by
walsender. My recent patch added a couple of places where the latch
can be set by the waiting process itself in response to an interrupt,
but that case can be handled by adding a backend-local flag variable
indicating whether we set the latch ourselves. If we determine that
the latch is set and the did-it-myself flag is clear, we can conclude
that we were awakened by a walsender and call it good. If the latch
is set and the did-it-myself flag is also set, then we were
interrupted, and we MAY also have been awakened by walsender at around
the same time. We grab SyncRepLock to remove ourselves from the
queue, and if we find we're already removed, then we know we were
interrupted just as walsender awakened us; otherwise, it's a pure
interrupt.

It'd be interesting to see the results of some testing with
LWLOCK_STATS defined, to see whether SyncRepLock actually is contended
and if so to what degree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-20 06:24:36 Re: VACUUM FULL deadlock with backend startup
Previous Message hom 2011-03-20 03:50:01 Re: I am confused after reading codes of PostgreSQL three week