From: | Dave Cramer <davecramer(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | mansour(at)oxplot(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15808: ERROR: subtransaction logged without previous top-level txn record (SQLSTATE XX000) |
Date: | 2019-09-06 20:39:15 |
Message-ID: | CADK3HHL97Z3ZsDp0WUPWjjZzFZsyP3Po1LJ4xcjC=JjgtUiZOQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, 16 May 2019 at 13:04, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2019-05-16 04:56:15 +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 15808
> > Logged by: Mansour Behabadi
> > Email address: mansour(at)oxplot(dot)com
> > PostgreSQL version: 10.6
> > Operating system: Amazon RDS
> > Description:
> >
> > We have some custom logical replication client that makes
> > pg_logical_slot_get_changes() calls in SQL. E.g.:
>
> Unrelated to the bug: You really should use the streaming
> interface. It's much, much, much more efficient.
>
> https://www.postgresql.org/docs/current/logicaldecoding-walsender.html
>
>
> > Once every few thousand calls, we get the following error:
> >
> > ERROR: subtransaction logged without previous top-level txn record
> (SQLSTATE
> > XX000)
> >
> > which will persist on all subsequent calls, essentially forcing us to
> drop
> > the slot and create a new one.
>
> That obviously shouldn't happen.
>
>
> > We had little success looking for solutions online and the only lead is
> that
> > of a recent commit
> > (
> https://github.com/postgres/postgres/commit/f49a80c481f74fa81407dce8e51dea6956cb64f8
> )
> > whose commit message seem to correlate to the error we're getting. Below
> is
> > the relevant excerpt:
> >
> > The second issue concerns SnapBuilder snapshots and subtransactions.
> > SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
> > transaction that is known to be a subtxn, which is good in the common
> > case that the top-level transaction already has one (no point in doing
> > so), but a bug otherwise. To fix, arrange to transfer the snapshot from
> > the subtxn to its top-level txn as soon as the kinship gets known.
> > test_decoding's snapshot_transfer verifies this.
>
> That seems unrelated to the error message you're getting.
>
>
> > We're not sure if this is a fix to our problem and whether upgrading to
> > Postgres 11 (which has this change in it) will solve the issue.
>
> Note that this change isn't just in 11:
>
> Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> Branch: master Release: REL_11_BR [f49a80c48] 2018-06-26 16:48:10 -0400
> Branch: REL_10_STABLE Release: REL_10_5 [b767b3f2e] 2018-06-26 16:38:34
> -0400
> Branch: REL9_6_STABLE Release: REL9_6_10 [da10d6a8a] 2018-06-26 16:38:34
> -0400
> Branch: REL9_5_STABLE Release: REL9_5_14 [4cb6f7837] 2018-06-26 16:38:34
> -0400
> Branch: REL9_4_STABLE Release: REL9_4_19 [962313558] 2018-06-26 16:38:34
> -0400
>
>
> > Please let me know if any more info is needed.
>
> The easiest way to progress here would be a recipe to reproduce the
> problem. As long as the problem is on RDS, we unfortunately can't really
> debug this - neither can we modify the source to emit more debugging
> information, nor can we inspect the WAL files ourselves (I think).
>
> It's possible that trying to reproduce this on RDS with the debug level
> set to very high (debug5) would allow for a bit more insight. But I'm
> somewhat doubtful.
>
>
Andres,
It's possible that I have someone that would be able to run this in a
non-RDS environment.
It's unlikely we have a reproducible test case, but it's likely we can
modify the code on their boxes for debugging and or get WAL files for
inspection.
This is in a version of 9.6.14 so the above fix should be in it.
I'm willing to facilitate if you can provide some direction.
Dave
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2019-09-06 20:40:53 | Re: ERROR: multixact X from before cutoff Y found to be still running |
Previous Message | Robert Haas | 2019-09-06 17:25:36 | Re: ERROR: multixact X from before cutoff Y found to be still running |