Re: Possible bug in logical replication.

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Possible bug in logical replication.
Date: 2018-05-17 09:31:09
Message-ID: c5c485e1-9249-e127-692d-59eab95bcbf9@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.05.2018 10:45, Konstantin Knizhnik wrote:
> We got the following assertion failure at our buildfarm of master
> branch of Postgres in contrib/test_decoding regression tests:
>
> 2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL: Streaming transactions committing after 0/2A00000, reading WAL from 0/29FFF1C.
> 2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT: SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174')
> TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)
>
>
> Stack trace is the following:
>
> $ gdb -x ./gdbcmd --batch pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres pgsql.build/contrib/test_decoding/tmp_check/data/core
> [New LWP 21678]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
> Core was generated by `postgres: buildfarm regression [local] SELECT '.
> Program terminated with signal SIGABRT, Aborted.
> #0 0xf7722c89 in __kernel_vsyscall ()
> #0 0xf7722c89 in __kernel_vsyscall ()
> #1 0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
> #2 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
> #3 0xf6b5f297 in __GI_abort () at abort.c:89
> #4 0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 "FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
> #5 0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, errormsg=0xffaf2560) at xlogreader.c:241
> #6 0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized out>, moveto=44040564) at slotfuncs.c:370
> #7 0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at slotfuncs.c:487
> #8 0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, randomAccess=false) at execSRF.c:231
> #9 0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
> #10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
> #11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
> #12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
> #13 0x5689caba in ExecProcNode (node=0x57626cfc) at ../../../src/include/executor/executor.h:238
> #14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
> #15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:368
> #16 0x56a132cd in PortalRunSelect (portal=portal(at)entry=0x575c4ea8, forward=forward(at)entry=true, count=0, count(at)entry=2147483647, dest=0x576232d4) at pquery.c:932
> #17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 "") at pquery.c:773
> #18 0x56a0fbb7 in exec_simple_query (query_string=query_string(at)entry=0x57579070 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') ") at postgres.c:1122
> #19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, username=0x575a09f0 "buildfarm") at postgres.c:4153
> #20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
> #21 BackendStartup (port=0x5759a358) at postmaster.c:4033
> #22 ServerLoop () at postmaster.c:1706
> #23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1379
> #24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at main.c:228
>
>
>
> As you can see, assertion failure happen because specified startlsn (0x2a00000) is not considered to be valid.
> This LSN is taken from slot's confirmed flush LSN in pg_replication_slot_advance:
> startlsn = MyReplicationSlot->data.confirmed_flush;
>
>
> Unfortunately I was not able to reproduce the problem even repeating this regression tests 1000 times, so it seems to be very difficult to reproduced non-deterministic bug.
> I wonder if there is a warranty that confirmed_flush always points to the start of the record.
> Inspecting of xlogreader.c code shows that confirmed_flush is for example assigned EndRecPtr in DecodingContextFindStartpoint.
> And EndRecPtr is updated in XLogReadRecord and if record doesn't cross page boundary, then the following formula is used:
>
> state->EndRecPtr = RecPtr + MAXALIGN(total_len);
>
> And if record ends at page boundary, then looks like EndRecPtr can point to page boundary.
> It is confirmed by the following comment in XLogReadRecord function:
>
> /*
> * RecPtr is pointing to end+1 of the previous WAL record. If we're
> * at a page boundary, no more records can fit on the current page. We
> * must skip over the page header, but we can't do that until we've
> * read in the page, since the header size is variable.
> */
>
> But it means that it is possible that confirmed_flush can also point to the page boundary which will cause this assertion failure in XLogReadRecord:
>
> /*
> * Caller supplied a position to start at.
> *
> * In this case, the passed-in record pointer should already be
> * pointing to a valid record starting position.
> */
> Assert(XRecOffIsValid(RecPtr));
>
>
> May be this bug is very difficult to reproduce because it takes place only if WAL record preceding specified start point ends exactly at page boundary. The probability of it seems to be quite small...
>
> I am not so familiar with wal iteration code and logical decoding, so I will be pleased to receive confirmation orconfutation of my analysis of the problem.
> --
> Konstantin Knizhnik
> Postgres Professional:http://www.postgrespro.com
> The Russian Postgres Company

Looks like it is not the first time this problem was encountered:

https://www.postgresql.org/message-id/flat/34d66f63-40a9-4c3e-c9a1-248d1e393d29%40enterprisedb(dot)com#34d66f63-40a9-4c3e-c9a1-248d1e393d29(at)enterprisedb(dot)com

I wonder who is now maintaining logical replication stuff and whether
this bug is going to be fixed?
I think that using restart_lsn instead of confirmed_flush is not right
approach.
If restart_lsn is not available and confirmed_flush is pointing to page
boundary, then in any case we should somehow handle this case and adjust
startlsn to point on the valid record position (by jjust adding page
header size?).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-05-17 09:32:16 overhead of expression evaluation
Previous Message Paul Guo 2018-05-17 09:23:28 Re: [PATCH] Use access() to check file existence in GetNewRelFileNode().