Quick Links

Re: snapbuild woes

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: snapbuild woes
Date:	2017-03-03 00:30:11
Message-ID:	f41d64c7-1ae2-fac7-bf7f-f24dfe2d33bd@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 26/02/17 01:43, Petr Jelinek wrote:
> On 24/02/17 22:56, Petr Jelinek wrote:
>> On 22/02/17 03:05, Petr Jelinek wrote:
>>> On 13/12/16 00:38, Petr Jelinek wrote:
>>>> On 12/12/16 23:33, Andres Freund wrote:
>>>>> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>>>>>> On 12/12/16 22:42, Andres Freund wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>>>>>>>> Hi,
>>>>>>>> First one is outright bug, which has to do with how we track running
>>>>>>>> transactions. What snapbuild basically does while doing initial snapshot
>>>>>>>> is read the xl_running_xacts record, store the list of running txes and
>>>>>>>> then wait until they all finish. The problem with this is that
>>>>>>>> xl_running_xacts does not ensure that it only logs transactions that are
>>>>>>>> actually still running (to avoid locking PGPROC) so there might be xids
>>>>>>>> in xl_running_xacts that already committed before it was logged.
>>>>>>>
>>>>>>> I don't think that's actually true? Notice how LogStandbySnapshot()
>>>>>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>>>>>> wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
>>>>>>> observed must actually be a bit more complex :(
>>>>>>>
>>>>>>
>>>>>> Hmm, interesting, I did see the transaction commit in the WAL before the
>>>>>> xl_running_xacts that contained the xid as running. I only seen it on
>>>>>> production system though, didn't really manage to easily reproduce it
>>>>>> locally.
>>>>>
>>>>> I suspect the reason for that is that RecordTransactionCommit() doesn't
>>>>> conflict with ProcArrayLock in the first place - only
>>>>> ProcArrayEndTransaction() does. So they're still running in the PGPROC
>>>>> sense, just not the crash-recovery sense...
>>>>>
>>>>
>>>> That looks like reasonable explanation. BTW I realized my patch needs
>>>> bit more work, currently it will break the actual snapshot as it behaves
>>>> same as if the xl_running_xacts was empty which is not correct AFAICS.
>>>>
>>>
>>> Hi,
>>>
>>> I got to work on this again. Unfortunately I haven't found solution that
>>> I would be very happy with. What I did is in case we read
>>> xl_running_xacts which has all transactions we track finished, we start
>>> tracking from that new xl_running_xacts again with the difference that
>>> we clean up the running transactions based on previously seen committed
>>> ones. That means that on busy server we may wait for multiple
>>> xl_running_xacts rather than just one, but at least we have chance to
>>> finish unlike with current coding which basically waits for empty
>>> xl_running_xacts. I also removed the additional locking for logical
>>> wal_level in LogStandbySnapshot() since it does not work.
>>
>> Not hearing any opposition to this idea so I decided to polish this and
>> also optimize it a bit.
>>
>> That being said, thanks to testing from Erik Rijkers I've identified one
>> more bug in how we do the initial snapshot. Apparently we don't reserve
>> the global xmin when we start building the initial exported snapshot for
>> a slot (we only reserver catalog_xmin which is fine for logical decoding
>> but not for the exported snapshot) so the VACUUM and heap pruning will
>> happily delete old versions of rows that are still needed by anybody
>> trying to use that exported snapshot.
>>
>
> Aaand I found one more bug in snapbuild. Apparently we don't protect the
> snapshot builder xmin from going backwards which can yet again result in
> corrupted exported snapshot.
>
> Summary of attached patches:
> 0001 - Fixes the above mentioned global xmin tracking issues. Needs to
> be backported all the way to 9.4
>
> 0002 - Removes use of the logical decoding saved snapshots for initial
> exported snapshot since those snapshots only work for catalogs and not
> user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

>
> 0003 - Makes sure snapshot builder xmin is not moved backwards by
> xl_running_xacts (which can otherwise happen during initial snapshot
> building). Also should be backported to 9.4.
>
> 0004 - Changes handling of the xl_running_xacts in initial snapshot
> build to what I wrote above and removes the extra locking from
> LogStandbySnapshot introduced by logical decoding.
>
> 0005 - Improves performance of initial snapshot building by skipping
> catalog snapshot build for transactions that don't do catalog changes.
>

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch	text/x-patch	2.5 KB
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch	text/x-patch	2.7 KB
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch	text/x-patch	1.0 KB
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch	text/x-patch	7.7 KB
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch	text/x-patch	6.2 KB

In response to

Re: snapbuild woes at 2017-02-26 00:43:19 from Petr Jelinek

Responses

Re: snapbuild woes at 2017-03-03 00:53:54 from Erik Rijkers
Re: snapbuild woes at 2017-04-12 15:32:02 from Simon Riggs
Re: snapbuild woes at 2017-04-13 00:29:59 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Erik Rijkers	2017-03-03 00:53:54	Re: snapbuild woes
Previous Message	Thomas Munro	2017-03-02 23:58:00	Re: brin autosummarization -- autovacuum "work items"