Re: logical changeset generation v3

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: logical changeset generation v3
Date: 2012-11-21 05:57:08
Message-ID: CAB7nPqQgbFVuTb5GgHLAzxcgkNE6=tENdzoTVQSED81VBi+m9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 20, 2012 at 8:22 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Those aren't unexpected. Perhaps I should not make it a warning then...
>
> A short explanation:
>
> We can only decode tuples we see in the WAL when we already have a
> timetravel catalog snapshot before that transaction started. To build
> such a snapshot we need to collect information about committed which
> changed the catalog. Unfortunately we can't diagnose whether a txn
> changed the catalog without a snapshot so we just assume all committed
> ones do - it just costs a bit of memory. Thats the background of the
> "forced to assume catalog changes for ..." message.
>
> The reason for the ABORTs is related but different. We start out in the
> "SNAPBUILD_START" state when we try to build a snapshot. When we find
> initial information about running transactions (i.e. xl_running_xacts)
> we switch to the "SNAPBUILD_FULL_SNAPSHOT" state which means we can
> decode all changes in transactions that start *after* the current
> lsn. Earlier transactions might have tuples on a catalog state we can't
> query.
> Only when all transactions we observed as running before the
> FULL_SNAPSHOT state have finished we switch to SNAPBUILD_CONSISTENT.
> As we want a consistent/reproducible set of transactions to produce
> output via the logstream we only pass transactions to the output plugin
> if they commit *after* CONSISTENT (they can start earlier though!). This
> allows us to produce a pg_dump compatible snapshot in the moment we get
> consistent that contains exactly the changes we won't stream out.
>
> Makes sense?
>
> > 3) Assertion failure while running pgbench, I was just curious to see
> how
> > it reacted when logical replication was put under a little bit of load.
> > TRAP: FailedAssertion("!(((xid) >= ((TransactionId) 3)) &&
> > ((snapstate->xmin_running) >= ((TransactionId) 3)))", File:
> "snapbuild.c",
> > Line: 877)
> > => pgbench -i postgres; pgbench -T 500 -c 8 postgres
>
> Can you reproduce this one? I would be interested in log output. Because
> I did run pgbench for quite some time and I haven't seen that one after
> fixing some issues last week.
>

> It implies that snapstate->nrrunning has lost touch with reality...
>
Yes, I can reproduce in 10-20 seconds in one of my linux boxes. I haven't
outputted anything in the logs, but here is the backtrace of the core file
produced.
#2 0x0000000000865145 in ExceptionalCondition (conditionName=0xa15100
"!(((xid) >= ((TransactionId) 3)) && ((snapstate->xmin_running) >=
((TransactionId) 3)))", errorType=0xa14f3b "FailedAssertion",
fileName=0xa14ed0 "snapbuild.c", lineNumber=877) at assert.c:54
#3 0x000000000070c409 in SnapBuildTxnIsRunning (snapstate=0x19e4f10,
xid=0) at snapbuild.c:877
#4 0x000000000070b8e4 in SnapBuildProcessChange (reorder=0x19e4e80,
snapstate=0x19e4f10, xid=0, buf=0x1a0a368, relfilenode=0x1a0a450) at
snapbuild.c:388
#5 0x000000000070c088 in SnapBuildDecodeCallback (reorder=0x19e4e80,
snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732
#6 0x00000000007080b9 in DecodeRecordIntoReorderBuffer (reader=0x1a08300,
state=0x19e4e20, buf=0x1a0a368) at decode.c:84
#7 0x0000000000708cad in replay_finished_record (state=0x1a08300,
buf=0x1a0a368) at logicalfuncs.c:54
#8 0x00000000004d8033 in XLogReaderRead (state=0x1a08300) at
xlogreader.c:965
#9 0x000000000070f7c3 in XLogSendLogical (caughtup=0x7fffb22c35fb "") at
walsender.c:1721
#10 0x000000000070ea05 in WalSndLoop (send_data=0x70f6e2 <XLogSendLogical>)
at walsender.c:1184
#11 0x000000000070e0eb in StartLogicalReplication (cmd=0x190eaa8) at
walsender.c:726
#12 0x000000000070e3ac in exec_replication_command (cmd_string=0x19a65c8
"START_LOGICAL_REPLICATION 'id-0' 0/7E1855C") at walsender.c:853
#13 0x0000000000753ee0 in PostgresMain (argc=2, argv=0x18f63d8,
username=0x18f62a8 "michael") at postgres.c:3974
#14 0x00000000006f13ea in BackendRun (port=0x1912600) at postmaster.c:3668
#15 0x00000000006f0b76 in BackendStartup (port=0x1912600) at
postmaster.c:3352
#16 0x00000000006ed900 in ServerLoop () at postmaster.c:1431
#17 0x00000000006ed208 in PostmasterMain (argc=13, argv=0x18f40a0) at
postmaster.c:1180
#18 0x0000000000657517 in main (argc=13, argv=0x18f40a0) at main.c:197
I'm keeping this core and the binary btw.
--
Michael Paquier
http://michael.otacoo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-11-21 06:10:17 Re: StrategyGetBuffer questions
Previous Message Josh Kupershmidt 2012-11-21 05:53:23 Re: Suggestion for --truncate-tables to pg_restore