Re: BUG #13660: serializable snapshotting hangs

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "cpacejo(at)clearskydata(dot)com" <cpacejo(at)clearskydata(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13660: serializable snapshotting hangs
Date: 2015-10-06 21:38:15
Message-ID: 1379192640.1267648.1444167495623.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thursday, October 1, 2015 9:19 PM, "cpacejo(at)clearskydata(dot)com" <cpacejo(at)clearskydata(dot)com> wrote:

> PostgreSQL version: 9.4.4

> After running fine for weeks, we now find that serializable snapshots hang:
>
> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE;
> START TRANSACTION
> our_db=> SELECT pg_export_snapshot();
> (...hangs indefinitely...)

> Killing all backends (i.e. including those accessing other databases)
> unblocked serializable snapshotting.
>
> Is this expected behavior?

What should be happening is that when the serializable read only
deferrable transaction executes the first statement that needs a
snapshot, it grabs a snapshot and looks for concurrent read write
transactions.  If there are none, the snapshot is safe and it goes
ahead with execution; otherwise it goes into a loop where it delays
(to avoid sucking 100% CPU for a core) by calling
ProcWaitForSignal().  Each time that function returns it checks
whether any of the concurrent read write transactions developed a
read-write conflict out (i.e., it read data which was modified by a
transaction concurrent to *it*) to a transaction which committed
before this read only transaction got its snapshot.  If so, this
snapshot is unsafe and we need to discard it, get a new snapshot,
and go back into the loop.  If all read write transactions
concurrent to the read only transaction complete without developing
such a conflict our snapshot is safe and we can proceed to run
without taking predicate locks or risking a serialization failure.

Now, that's pretty technical, and you definitely don't need to
follow all of the above to safely use deferrable transactions.  The
short version is that your initial statement that needs a snapshot
cannot begin to execute *at least* until any concurrent
transactions which are not READ ONLY complete.  One connection
stuck "idle in transaction" can hold it up indefinitely.  A
prepared transaction also counts as a transaction which has not yet
completed.

So if you had any long-running READ WRITE transactions (including
one stuck "idle in transaction" or prepared but not committed), it
is expected behavior.  If not, we should try to gather more
information to figure out what's going on.

One other possibility is that your workload has changed such that
there are now so many read write transactions reading data modified
by other transactions that it is just not finding a safe snapshot
very quickly.  Even when running benchmarks at saturation levels on
a 16 core machine we rarely saw a delay more than six seconds, but
there is no upper bound on how long it might take.  In such a case
you would have a few options, but let's not get ahead of ourselves
-- the first thing is to check for "idle in transaction" or
prepared transaction issues.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Chris Pacejo 2015-10-06 21:47:08 Re: BUG #13660: serializable snapshotting hangs
Previous Message slava 2015-10-06 18:56:02 BUG #13664: pg_restore fails with -j parameter when restoring 9.1 db onto 9.5 alpha db