The End of the WAL

From: Christoph Berg <christoph(dot)berg(at)credativ(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: The End of the WAL
Date: 2021-09-10 10:06:33
Message-ID: YTsuKcwWwMOii+WB@msg.credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Re: Heikki Linnakangas
> On 09/09/2021 17:25, Tom Lane wrote:
> > Having done that, the check in md.c could be reduced to an Assert,
> > although there's something to be said for leaving it as-is as an
> > extra layer of defense.
>
> Some operations call smgrextend() directly, like B-tree index creation. We
> don't want those operations to hit an assertion either.

Thanks, I can now see a proper error on 15devel with cassert enabled:

# insert into huge select generate_series(1,1000);
FEHLER: 54000: cannot extend relation base/13550/16384 beyond 4294967295 blocks
ORT: ReadBuffer_common, bufmgr.c:831

Some months ago I had already tried what happens on running into
another limit, namely the end of WAL. Back then I was attributing the
result to "won't happen anyway", but since we are talking asserts,
there is the result:

/usr/lib/postgresql/15/bin/pg_resetwal -l 00000001FFFFFFFF000000FF -D $PWD
select pg_current_wal_lsn();
pg_current_wal_lsn
────────────────────
FFFFFFFF/FF000150
create table foo (id bigint);
repeat a few times: insert into foo select generate_series(1,200000);

15devel, cassert:

TRAP: FailedAssertion("XLogRecPtrToBytePos(*EndPos) == endbytepos", File: "./build/../src/backend/access/transam/xlog.c", Line: 1324, PID: 1651661)
postgres: 15/regress: cbe postgres ::1(45564) INSERT(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x223022)[0x564ad115f022]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(heap_insert+0x3ae)[0x564ad10f0a2e]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x1bf8e9)[0x564ad10fb8e9]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35e30c)[0x564ad129a30c]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35eedc)[0x564ad129aedc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(standard_ExecutorRun+0x115)[0x564ad12695b5]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4da312)[0x564ad1416312]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4db0ee)[0x564ad14170ee]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PortalRun+0x2ec)[0x564ad14176bc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4d72b6)[0x564ad14132b6]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PostgresMain+0x181c)[0x564ad1414edc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x43fd80)[0x564ad137bd80]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(PostmasterMain+0xca0)[0x564ad137cd10]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(main+0x221)[0x564ad10973d1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f63bcbe1d0a]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(_start+0x2a)[0x564ad10979ca]
2021-09-10 11:44:40.997 CEST [1651617] LOG: request to flush past end of generated WAL; request FFFFFFFF/FFFFE000, current position 0/50
2021-09-10 11:44:42.019 CEST [1651613] LOG: Serverprozess (PID 1651661) wurde von Signal 6 beendet: Abgebrochen
2021-09-10 11:44:42.019 CEST [1651613] DETAIL: Der fehlgeschlagene Prozess führte aus: insert into foo select generate_series(1,200000);

The system properly (?) recovers, resuming at some FFFFFFFF/FFE614B8
position (i.e. discarding the part that was overflowing). However, if
I push it by moving closer to the end by doing smaller inserts, I can
get it into an infinite recovery loop:

2021-09-10 11:48:41.050 CEST [1652403] LOG: Datenbanksystem wurde beim Herunterfahren unterbrochen; letzte bekannte Aktion am 2021-09-10 11:48:40 CEST
2021-09-10 11:48:41.051 CEST [1652403] LOG: Datenbanksystem wurde nicht richtig heruntergefahren; automatische Wiederherstellung läuft
2021-09-10 11:48:41.051 CEST [1652403] LOG: Redo beginnt bei FFFFFFFF/FFFFDF78
2021-09-10 11:48:41.051 CEST [1652403] LOG: ungültige Datensatzlänge bei FFFFFFFF/FFFFDFB0: 24 erwartet, 0 erhalten
2021-09-10 11:48:41.051 CEST [1652403] LOG: redo done at FFFFFFFF/FFFFDF78 system usage: CPU: Benutzer: 0,00 s, System: 0,00 s, verstrichen: 0,00 s
TRAP: FailedAssertion("((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC", File: "./build/../src/backend/access/transam/xlog.c", Line: 1982, PID: 1652404)
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x2221e8)[0x564ad115e1e8]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(XLogInsertRecord+0x587)[0x564ad115ea27]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(CreateCheckPoint+0x64e)[0x564ad11608ee]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(CheckpointerMain+0x3d4)[0x564ad136db34]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(AuxiliaryProcessMain+0xef)[0x564ad136bacf]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x43d116)[0x564ad1379116]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(+0x43f71a)[0x564ad137b71a]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(PostmasterMain+0xca0)[0x564ad137cd10]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(main+0x221)[0x564ad10973d1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f63bcbe1d0a]
postgres: 15/regress: checkpointer performing end-of-recovery checkpoint(_start+0x2a)[0x564ad10979ca]
2021-09-10 11:48:41.791 CEST [1651613] LOG: Checkpointer-Prozess (PID 1652404) wurde von Signal 6 beendet: Abgebrochen
2021-09-10 11:48:41.791 CEST [1651613] LOG: aktive Serverprozesse werden abgebrochen
2021-09-10 11:48:41.791 CEST [1651613] LOG: alle Serverprozesse beendet; initialisiere neu
... goto 10

Doing the same test on 15devel without cassert, the inserting query
gets stuck in a busy loop (no wait_event) that ^C won't terminate.

Two backtraces from that running process:

(gdb) bt
#0 0x0000563604fdb63b in memset (__len=8192, __ch=0, __dest=0x7f710f9b6000) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#1 AdvanceXLInsertBuffer (upto=upto(at)entry=18446744073709543424, opportunistic=opportunistic(at)entry=false)
at ./build/../src/backend/access/transam/xlog.c:2220
#2 0x0000563604fdb920 in GetXLogBuffer (ptr=18446744073709543424) at ./build/../src/backend/access/transam/xlog.c:1959
#3 0x0000563604fdd1b1 in CopyXLogRecordToWAL (EndPos=18446744073709543480, StartPos=18446744073709543392,
rdata=0x56360562b1c0 <hdr_rdt>, isLogSwitch=false, write_len=63) at ./build/../src/backend/access/transam/xlog.c:1558
#4 XLogInsertRecord (rdata=rdata(at)entry=0x56360562b1c0 <hdr_rdt>, fpw_lsn=fpw_lsn(at)entry=18446744073709543392, flags=<optimized out>,
num_fpi=num_fpi(at)entry=0) at ./build/../src/backend/access/transam/xlog.c:1123
#5 0x0000563604fe774a in XLogInsert (rmid=rmid(at)entry=10 '\n', info=info(at)entry=0 '\000')
at ./build/../src/backend/access/transam/xloginsert.c:480
#6 0x0000563604f87b27 in heap_insert (relation=relation(at)entry=0x7f710f6d7938, tup=tup(at)entry=0x5636061bc288, cid=cid(at)entry=0,
options=options(at)entry=0, bistate=bistate(at)entry=0x0) at ./build/../src/backend/access/heap/heapam.c:2208
#7 0x0000563604f8fe89 in heapam_tuple_insert (relation=0x7f710f6d7938, slot=0x5636061bc1f8, cid=0, options=0, bistate=0x0)
at ./build/../src/backend/access/heap/heapam_handler.c:252
#8 0x00005636050ff35c in table_tuple_insert (bistate=0x0, options=0, cid=<optimized out>, slot=0x5636061bc1f8, rel=<optimized out>)
at ./build/../src/include/access/tableam.h:1374
#9 ExecInsert (mtstate=0x5636061a5ad8, resultRelInfo=0x5636061a5ce8, slot=0x5636061bc1f8, planSlot=0x5636061bb778,
estate=0x5636061a5868, canSetTag=<optimized out>) at ./build/../src/backend/executor/nodeModifyTable.c:934
#10 0x00005636051004c7 in ExecModifyTable (pstate=<optimized out>) at ./build/../src/backend/executor/nodeModifyTable.c:2561
#11 0x00005636050d599d in ExecProcNode (node=0x5636061a5ad8) at ./build/../src/include/executor/executor.h:257
#12 ExecutePlan (execute_once=<optimized out>, dest=0x5636061b22f8, direction=<optimized out>, numberTuples=0,
sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>, planstate=0x5636061a5ad8, estate=0x5636061a5868)
at ./build/../src/backend/executor/execMain.c:1551
#13 standard_ExecutorRun (queryDesc=0x563606109d08, direction=<optimized out>, count=0, execute_once=<optimized out>)
at ./build/../src/backend/executor/execMain.c:361
#14 0x00005636052501e2 in ProcessQuery (plan=0x5636061b2218,
sourceText=0x5636060e7128 "insert into foo select generate_series(1,500000);", params=0x0, queryEnv=0x0, dest=0x5636061b22f8,
qc=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:160
#15 0x0000563605250dd9 in PortalRunMulti (portal=portal(at)entry=0x56360614b3a8, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=dest(at)entry=0x5636061b22f8, altdest=altdest(at)entry=0x5636061b22f8,
qc=qc(at)entry=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:1266
#16 0x000056360525129c in PortalRun (portal=portal(at)entry=0x56360614b3a8, count=count(at)entry=9223372036854775807,
isTopLevel=isTopLevel(at)entry=true, run_once=run_once(at)entry=true, dest=dest(at)entry=0x5636061b22f8,
altdest=altdest(at)entry=0x5636061b22f8, qc=0x7ffdbe808cc0) at ./build/../src/backend/tcop/pquery.c:786
#17 0x000056360524d2dd in exec_simple_query (query_string=0x5636060e7128 "insert into foo select generate_series(1,500000);")
at ./build/../src/backend/tcop/postgres.c:1214
#18 0x000056360524f20f in PostgresMain (argc=argc(at)entry=1, argv=argv(at)entry=0x7ffdbe809190, dbname=<optimized out>,
username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4488
#19 0x00005636051cc178 in BackendRun (port=<optimized out>, port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4521
#20 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4243
#21 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1765
#22 0x00005636051cd000 in PostmasterMain (argc=argc(at)entry=5, argv=argv(at)entry=0x5636060e1ea0)
at ./build/../src/backend/postmaster/postmaster.c:1437
#23 0x0000563604f457f1 in main (argc=5, argv=0x5636060e1ea0) at ./build/../src/backend/main/main.c:199

(gdb) bt
#0 0x0000563604fdb6cf in AdvanceXLInsertBuffer (upto=upto(at)entry=18446744073709543424, opportunistic=opportunistic(at)entry=false)
at ./build/../src/backend/access/transam/xlog.c:2147
#1 0x0000563604fdb920 in GetXLogBuffer (ptr=18446744073709543424) at ./build/../src/backend/access/transam/xlog.c:1959
#2 0x0000563604fdd1b1 in CopyXLogRecordToWAL (EndPos=18446744073709543480, StartPos=18446744073709543392,
rdata=0x56360562b1c0 <hdr_rdt>, isLogSwitch=false, write_len=63) at ./build/../src/backend/access/transam/xlog.c:1558
#3 XLogInsertRecord (rdata=rdata(at)entry=0x56360562b1c0 <hdr_rdt>, fpw_lsn=fpw_lsn(at)entry=18446744073709543392, flags=<optimized out>,
num_fpi=num_fpi(at)entry=0) at ./build/../src/backend/access/transam/xlog.c:1123
#4 0x0000563604fe774a in XLogInsert (rmid=rmid(at)entry=10 '\n', info=info(at)entry=0 '\000')
at ./build/../src/backend/access/transam/xloginsert.c:480
#5 0x0000563604f87b27 in heap_insert (relation=relation(at)entry=0x7f710f6d7938, tup=tup(at)entry=0x5636061bc288, cid=cid(at)entry=0,
options=options(at)entry=0, bistate=bistate(at)entry=0x0) at ./build/../src/backend/access/heap/heapam.c:2208
...

Christoph
--
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer, Geoff Richardson,
Peter Lilley; Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-09-10 11:42:27 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Previous Message Pavel Stehule 2021-09-10 08:51:57 Re: Schema variables - new implementation for Postgres 15