Re: Assertion failure when streaming logical changes

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure when streaming logical changes
Date: 2015-02-11 00:51:52
Message-ID: 54DAA7A8.5000705@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/10/2015 02:46 PM, Andres Freund wrote:
> On 2015-02-10 21:20:37 +0900, Michael Paquier wrote:
>> Using test_decoding on HEAD (cc761b1) I am seeing the following assertion
>> failure:
>> TRAP: FailedAssertion("!(!((&RegisteredSnapshots)->ph_root ==
>> ((void*)0)))", File: "snapmgr.c", Line: 677)
>
> Ick. I guess a revert of 94028691609f8e148bd4ce72c46163f018832a5b fixes
> it?
> ...
>
> Yea, it really looks like the above commit is to "blame". The new xmin
> tracking infrastructure doesn't know about the historical snapshot...

The new xmin tracking code assumes that if a snapshots's regd_count > 0,
it has already been pushed to the RegisteredSnapshots heap. That
assumption doesn't hold because the logical decoding stuff creates its
own snapshots and sets regd_count=1 to prevent snapmgr.c from freeing
them, even though they haven't been registered with RegisterSnapshot.

The second paragraph in this comment in snapmgr.c needs fixing:

> * Likewise, any snapshots that have been exported by pg_export_snapshot
> * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
> * tracked by any resource owner.
> *
> * The same is true for historic snapshots used during logical decoding,
> * their lifetime is managed separately (as they life longer as one xact.c
> * transaction).

Besides the spelling, that's incorrect: historic snapshots are *not*
counted in RegisteredSnapshots. That was harmless before commit
94028691, but it was always wrong.

I think setting regd_count=1 outside snapmgr.c is a pretty ugly hack.
snapbuild.c also abuses active_count as a reference counter, which is
similarly ugly. I think regd_count and active_count should both be
treated as private to snapmgr.c, and initialized to zero elsewhere.

As a minimal fix, we could change the logical decoding code to not use
regd_count to prevent snapmgr.c from freeing its snapshots, but use
active_count for that too. Setting regd_count to 1 in
SnapBuildBuildSnapshot() seems unnecessary in the first place, as the
callers also call SnapBuildSnapIncRefcount(). Patch attached.

But could we get rid of those active_count manipulations too? Could you
replace the SnapBuildSnap[Inc|Dec]Refcount calls with
[Un]RegisterSnapshot()?

- Heikki

Attachment Content-Type Size
minimal-fix-for-historic-snapshots-1.patch application/x-patch 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Silva 2015-02-11 01:14:56 Re: reducing our reliance on MD5
Previous Message Peter Geoghegan 2015-02-11 00:32:20 Re: reducing our reliance on MD5