Re: Assertion failure when streaming logical changes

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Assertion failure when streaming logical changes
Date: 2015-04-07 09:22:12
Message-ID: CAMsr+YH_R-x3y8=843wdsnYGS9O-D3QxDbGbEvvfr2_t8E=-ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 February 2015 at 08:51, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
wrote:
>
>
> The new xmin tracking code assumes that if a snapshots's regd_count > 0,
> it has already been pushed to the RegisteredSnapshots heap. That assumption
> doesn't hold because the logical decoding stuff creates its own snapshots
> and sets regd_count=1 to prevent snapmgr.c from freeing them, even though
> they haven't been registered with RegisterSnapshot.
>
> The second paragraph in this comment in snapmgr.c needs fixing:
>
> * Likewise, any snapshots that have been exported by pg_export_snapshot
>> * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
>> * tracked by any resource owner.
>> *
>> * The same is true for historic snapshots used during logical decoding,
>> * their lifetime is managed separately (as they life longer as one xact.c
>> * transaction).
>>
>
> Besides the spelling, that's incorrect: historic snapshots are *not*
> counted in RegisteredSnapshots. That was harmless before commit 94028691,
> but it was always wrong.
>
>
> I think setting regd_count=1 outside snapmgr.c is a pretty ugly hack.
> snapbuild.c also abuses active_count as a reference counter, which is
> similarly ugly. I think regd_count and active_count should both be treated
> as private to snapmgr.c, and initialized to zero elsewhere.
>
> As a minimal fix, we could change the logical decoding code to not use
> regd_count to prevent snapmgr.c from freeing its snapshots, but use
> active_count for that too. Setting regd_count to 1 in
> SnapBuildBuildSnapshot() seems unnecessary in the first place, as the
> callers also call SnapBuildSnapIncRefcount(). Patch attached.
>

It might be a good idea to apply this if nothing better is forthcoming.
Logical decoding in WALsenders is broken at the moment.

Otherwise it needs to go on the 9.5 blockers list.

But could we get rid of those active_count manipulations too? Could you
> replace the SnapBuildSnap[Inc|Dec]Refcount calls with
> [Un]RegisterSnapshot()?
>

It would be interesting to know why it was done that way in the first
place, rather than using the snapshot management infrastructure. I presume
it needed to do something not directly offered by the snapshot manager but
I haven't managed to grasp what, exactly.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2015-04-07 10:22:57 Re: Proposal : REINDEX xxx VERBOSE
Previous Message Shigeru Hanada 2015-04-07 07:42:32 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)