snapbuild woes

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: snapbuild woes
Date: 2016-12-10 22:10:19
Message-ID: f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I recently found couple of issues with the way initial logical decoding
snapshot is made.

First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged. This in
turn means that the snapbuild will never make snapshot if the situation
occurs. The reason why this didn't bite us much yet is that the
snapbuild will consider all transactions finished if the empty
xl_running_xacts comes which usually happens after a while (but might
not on a consistently busy server).

The fix I came up with this is to extend the mechanism to also consider
all transactions finished when xl_running_xacts which has xmin bigger
then that previous xmax comes. This seems to work pretty well on busy
servers. This fix is attached as
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch. I
believe it should be backpatched all the way to 9.4.

The other issue is performance problem again on busy servers with
initial snapshot. We track transactions for catalog modifications so
that we can do proper catalog time travel for decoding of changes. But
for transactions that were running while we started trying to get
initial consistent snapshot, there is no good way to tell if they did
catalog changes or not so we consider them all as catalog changing. We
make separate historical snapshot for every such transaction. This by
itself is fine, the problem is that current implementation also
considers all the transactions that started after we started watching
for changes but before we reached consistent state to also do catalog
changes even though there we actually do know if they did any catalog
change or not. In practice it means we make snapshots that are not
really necessary and if there was long running transaction for which the
snapshot builder has to wait for then we can create thousands of unused
snapshots which affects performance in bad ways (I've seen the initial
snapshot taking hour because of this).

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch application/x-patch 3.0 KB
0002-Skip-unnecessary-snapshot-builds.patch application/x-patch 3.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-12-10 22:49:40 Re: snapbuild woes
Previous Message Artur Zakirov 2016-12-10 21:07:25 Re: [sqlsmith] Crash in tsquery_rewrite/QTNBinary