Re: snapbuild woes

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: snapbuild woes
Date: 2017-05-01 09:09:44
Message-ID: e6c1fdfe-50bc-62e4-3ad7-854f6ac5c35e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/05/17 10:03, Andres Freund wrote:
> On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
>> I agree with adding running, I think that's good thing even for the per
>> transaction tracking and snapshot exports - we could use the newly added
>> field to get rid of the issue we have with 'snapshot too large' when
>> there were many aborted transactions while we waited for running ones to
>> finish.
>
> I'm not sure of that - what I was proposing would only track this for
> the ->running substructure. How'd that help?
>

Well not as is, but it's a building block for it.

>
>> But, I still think we need to restart the tracking after new
>> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
>> that we assigned to transactions at the end of SnapBuildCommitTxn might
>> be corrupted otherwise as they were built before we knew one of the
>> supposedly running txes was actually already committed and that
>> transaction might have done catalog changes.
>
> I'm afraid you're right. But I think this is even more complicated: The
> argument in your version that this can only happen once, seems to also
> be holey: Just imagine a pg_usleep(3000 * 1000000) right before
> ProcArrayEndTransaction() and enjoy the picture.
>

Well yes, transaction can in theory have written commit/abort xlog
record and stayed in proc for more than single xl_running_xacts write.
But then the condition which we test that the new xl_running_xacts has
bigger xmin than the previously tracked one's xmax would not be
satisfied and we would not enter the relevant code path yet. So I think
we should not be able to get any xids we didn't see. But we have to
restart tracking from beginning (after first checking if we didn't
already see anything that the xl_running_xacts considers as running),
that's what my code did.

> Wonder if we should just (re-)add a stage between SNAPBUILD_START and
> SNAPBUILD_FULL_SNAPSHOT. Enter SNAPBUILD_BUILD_INITIAL_SNAPSHOT at the
> first xl_running_xacts, wait for all transactions to end with my
> approach, while populating SnapBuild->committed, only then start
> collecting changes for transactions (i.e. return true from
> SnapBuildProcessChange()), return true once all xacts have finished
> again. That'd presumably be a bit easier to understand, more robust -
> and slower.
>

That would also work, but per above, I don't understand why it's needed.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-05-01 10:52:58 Re: RFC: ALTER SYSTEM [...] COMMENT
Previous Message Kang Yuzhe 2017-05-01 08:16:11 Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...