Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-08-02 17:50:43
Message-ID: 20220802175043.GA13682@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 29, 2022 at 11:55:05AM -0400, Robert Haas wrote:
> On Mon, Mar 28, 2022 at 3:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > smgrcreate() as we would for most WAL records or whether it should be
> > adopting the new system introduced by
> > 49d9cfc68bf4e0d32a948fe72d5a0ef7f464944e. I wrote about this concern
> > over here:
> >
> > http://postgr.es/m/CA+TgmoYcUPL+WOJL2ZzhH=zmrhj0iOQ=iCFM0SuYqBbqZEamEg@mail.gmail.com
> >
> > But apart from that question your adaptations here look reasonable to me.
>
> That commit having been reverted, I committed v6 instead. Let's see
> what breaks...

There's a crash

2022-07-31 01:22:51.437 CDT client backend[13362] [unknown] PANIC: could not open critical system index 2662

(gdb) bt
#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007efe27999801 in __GI_abort () at abort.c:79
#2 0x00005583891941dc in errfinish (filename=<optimized out>, filename(at)entry=0x558389420437 "relcache.c", lineno=lineno(at)entry=4328,
funcname=funcname(at)entry=0x558389421680 <__func__.33178> "load_critical_index") at elog.c:675
#3 0x00005583891713ef in load_critical_index (indexoid=indexoid(at)entry=2662, heapoid=heapoid(at)entry=1259) at relcache.c:4328
#4 0x0000558389172667 in RelationCacheInitializePhase3 () at relcache.c:4103
#5 0x00005583891b93a4 in InitPostgres (in_dbname=in_dbname(at)entry=0x55838a50d468 "a", dboid=dboid(at)entry=0, username=username(at)entry=0x55838a50d448 "pryzbyj", useroid=useroid(at)entry=0,
load_session_libraries=<optimized out>, override_allow_connections=override_allow_connections(at)entry=false, out_dbname=0x0) at postinit.c:1087
#6 0x0000558388daa7bb in PostgresMain (dbname=0x55838a50d468 "a", username=username(at)entry=0x55838a50d448 "pryzbyj") at postgres.c:4081
#7 0x0000558388b9f423 in BackendRun (port=port(at)entry=0x55838a505dd0) at postmaster.c:4490
#8 0x0000558388ba6e07 in BackendStartup (port=port(at)entry=0x55838a505dd0) at postmaster.c:4218
#9 0x0000558388ba747f in ServerLoop () at postmaster.c:1808
#10 0x0000558388ba8f93 in PostmasterMain (argc=7, argv=<optimized out>) at postmaster.c:1480
#11 0x0000558388840e1f in main (argc=7, argv=0x55838a4dc000) at main.c:197

while :; do psql -qh /tmp postgres -c "DROP DATABASE a" -c "CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log"; done
# Run this for a few loops and then ^C or hold down ^C until it stops,
# and then connect to postgres and try to connect to 'a':
postgres=# \c a
2022-07-31 01:22:51.437 CDT client backend[13362] [unknown] PANIC: could not open critical system index 2662

Unfortunately, that isn't very consistent, and you have have to run it a bunch
of times...

I don't know if it's an issue of any significance that CREATE DATABASE / ^C
leaves behind a broken database, but it is an issue that the cluster crashes.

While struggling to reproduce that problem, I also hit this warning, which may
or may not be the same. I added an abort() after WARNING in aset.c to get a
backtrace.

WARNING: problem in alloc set PortalContext: bogus aset link in block 0x55a63f2f9d60, chunk 0x55a63f2fb138

Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No existe el archivo o el directorio.
(gdb) bt
#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f81144f1801 in __GI_abort () at abort.c:79
#2 0x000055a63c834c5d in AllocSetCheck (context=context(at)entry=0x55a63f26fea0) at aset.c:1491
#3 0x000055a63c835b09 in AllocSetDelete (context=0x55a63f26fea0) at aset.c:638
#4 0x000055a63c854322 in MemoryContextDelete (context=0x55a63f26fea0) at mcxt.c:252
#5 0x000055a63c8591d5 in PortalDrop (portal=portal(at)entry=0x55a63f2bb7a0, isTopCommit=isTopCommit(at)entry=false) at portalmem.c:596
#6 0x000055a63c3e4a7b in exec_simple_query (query_string=query_string(at)entry=0x55a63f24db90 "CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log ;") at postgres.c:1253
#7 0x000055a63c3e7fc1 in PostgresMain (dbname=<optimized out>, username=username(at)entry=0x55a63f279448 "pryzbyj") at postgres.c:4505
#8 0x000055a63c1dc423 in BackendRun (port=port(at)entry=0x55a63f271dd0) at postmaster.c:4490
#9 0x000055a63c1e3e07 in BackendStartup (port=port(at)entry=0x55a63f271dd0) at postmaster.c:4218
#10 0x000055a63c1e447f in ServerLoop () at postmaster.c:1808
#11 0x000055a63c1e5f93 in PostmasterMain (argc=7, argv=<optimized out>) at postmaster.c:1480
#12 0x000055a63be7de1f in main (argc=7, argv=0x55a63f248000) at main.c:197

I reproduced that by running this a couple dozen times in an interactive psql.
It doesn't seem to affect STRATEGY=file_copy.

SET statement_timeout=0; DROP DATABASE a; SET statement_timeout='60ms'; CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log ; \c a \c postgres

Also, if I understand correctly, this patch seems to assume that nobody is
connected to the source database. But what's actually enforced is just that
nobody *else* is connected. Is it any issue that the current DB can be used as
a source? Anyway, both of the above problems are reproducible using a
different database.

|postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
|CREATE DATABASE

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-08-02 17:50:51 Re: Cutting test runtime for src/test/modules/snapshot_too_old
Previous Message Jacob Champion 2022-08-02 17:48:53 Re: pg_auth_members.grantor is bunk