Re: BUG #13660: serializable snapshotting hangs

From: Chris Pacejo <cpacejo(at)clearskydata(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13660: serializable snapshotting hangs
Date: 2015-10-06 21:47:08
Message-ID: CAC8iE5isTnJ_8KHv6HyWAqW=2q37wpHTiTA3EBLhWzgLVTBJGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I get that, but I had no other open connections to that database.
There were many connections to *other* databases on that same
*server*, but that shouldn't affect taking a snapshot on an otherwise
unused database, correct?

Also note that once this happened to one database, it happened to
*all* databases on the same server simultaneously. It's as if the
read-write transaction check is performed server-wide, not
database-wide as I would expect.

On Tue, Oct 6, 2015 at 5:38 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> On Thursday, October 1, 2015 9:19 PM, "cpacejo(at)clearskydata(dot)com"
> <cpacejo(at)clearskydata(dot)com> wrote:
>
>> PostgreSQL version: 9.4.4
>
>> After running fine for weeks, we now find that serializable snapshots
>> hang:
>>
>> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY,
>> DEFERRABLE;
>> START TRANSACTION
>> our_db=> SELECT pg_export_snapshot();
>> (...hangs indefinitely...)
>
>> Killing all backends (i.e. including those accessing other databases)
>> unblocked serializable snapshotting.
>>
>> Is this expected behavior?
>
> What should be happening is that when the serializable read only
> deferrable transaction executes the first statement that needs a
> snapshot, it grabs a snapshot and looks for concurrent read write
> transactions. If there are none, the snapshot is safe and it goes
> ahead with execution; otherwise it goes into a loop where it delays
> (to avoid sucking 100% CPU for a core) by calling
> ProcWaitForSignal(). Each time that function returns it checks
> whether any of the concurrent read write transactions developed a
> read-write conflict out (i.e., it read data which was modified by a
> transaction concurrent to *it*) to a transaction which committed
> before this read only transaction got its snapshot. If so, this
> snapshot is unsafe and we need to discard it, get a new snapshot,
> and go back into the loop. If all read write transactions
> concurrent to the read only transaction complete without developing
> such a conflict our snapshot is safe and we can proceed to run
> without taking predicate locks or risking a serialization failure.
>
> Now, that's pretty technical, and you definitely don't need to
> follow all of the above to safely use deferrable transactions. The
> short version is that your initial statement that needs a snapshot
> cannot begin to execute *at least* until any concurrent
> transactions which are not READ ONLY complete. One connection
> stuck "idle in transaction" can hold it up indefinitely. A
> prepared transaction also counts as a transaction which has not yet
> completed.
>
> So if you had any long-running READ WRITE transactions (including
> one stuck "idle in transaction" or prepared but not committed), it
> is expected behavior. If not, we should try to gather more
> information to figure out what's going on.
>
> One other possibility is that your workload has changed such that
> there are now so many read write transactions reading data modified
> by other transactions that it is just not finding a safe snapshot
> very quickly. Even when running benchmarks at saturation levels on
> a 16 core machine we rarely saw a delay more than six seconds, but
> there is no upper bound on how long it might take. In such a case
> you would have a few options, but let's not get ahead of ourselves
> -- the first thing is to check for "idle in transaction" or
> prepared transaction issues.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message irina.guberman 2015-10-06 22:42:12 BUG #13665: Foreign Key constraint doesn't work
Previous Message Kevin Grittner 2015-10-06 21:38:15 Re: BUG #13660: serializable snapshotting hangs