Re: snapbuild woes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: snapbuild woes
Date: 2017-05-01 00:59:21
Message-ID: 20170501005921.qcqd7w2njpv3av6s@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> + /*
> + * c) we already seen the xl_running_xacts and tried to do the above.
> + * However because of race condition in LogStandbySnapshot() there might
> + * have been transaction reported as running but in reality has already
> + * written commit record before the xl_running_xacts so decoding has
> + * missed it. We now see xl_running_xacts that suggests all transactions
> + * from the original one were closed but the consistent state wasn't
> + * reached which means the race condition has indeed happened.
> + *
> + * Start tracking again as if this was the first xl_running_xacts we've
> + * seen, with the advantage that because decoding was already running,
> + * any transactions committed before the xl_running_xacts record will be
> + * known to us so we won't hit with the same issue again.
> + */

Unfortunately I don't think that's true, as coded. You're using
information about committed transactions:

> + else if (TransactionIdFollows(running->oldestRunningXid,
> + builder->running.xmax))
> + {
> + int off;
> +
> + SnapBuildStartXactTracking(builder, running);
> +
> /*
> + * Nark any transactions that are known to have committed before the
> + * xl_running_xacts as finished to avoid the race condition in
> + * LogStandbySnapshot().
> *
> + * We can use SnapBuildEndTxn directly as it only does the
> + * transaction running check and handling without any additional
> + * side effects.
> */
> + for (off = 0; off < builder->committed.xcnt; off++)
> + SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);

but a transaction might just have *aborted* before the new snapshot, no?
Since we don't keep track of those, I don't think this guarantees anything?

ISTM, we need a xip_status array in SnapBuild->running. Then, whenever
a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
the xid is not xl_running_xacts? Does that make sense?

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-05-01 01:24:44 Re: pg_dump emits ALTER TABLE ONLY partitioned_table
Previous Message Noah Misch 2017-05-01 00:52:11 Re: Re: logical replication and PANIC during shutdown checkpoint in publisher