Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
Date: 2021-08-10 01:23:07
Message-ID: 20210810012307.GB2693@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 09, 2021 at 01:08:42PM -0400, Robert Haas wrote:
> To reproduce, initialize a cluster with wal_level=minimal and
> max_wal_senders=0. Then from psql:
>
> \! mkdir /tmp/goose
>
> CHECKPOINT;
> CREATE TABLESPACE goose LOCATION '/tmp/goose';
> SET wal_skip_threshold=0;
> BEGIN;
> CREATE TABLE wild (a int, b text) TABLESPACE goose;
> INSERT INTO wild VALUES (1, 'chase');
> COMMIT;
> SELECT * FROM wild;
>
> As expected, you will see one row in table 'wild'. Now perform an
> immediate shutdown. Restart the server. Table 'wild' is now empty.

Thanks for finding the problem. It's a bad problem.

> The problem appears to be that tblspc_redo() calls
> create_tablespace_directories(), which says:
>
> /*
> * Our theory for replaying a CREATE is to forcibly drop the target
> * subdirectory if present, and then recreate it. This may be more
> * work than needed, but it is simple to implement.
> */
>
> Unfortunately, this theory (which dates to
> c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is
> correct only with wal_level>minimal. At wal_level='minimal', we can
> replay the record to recreate the relfilenode, but not the records
> that would have created the contents. However, note that if the table
> is smaller than wal_skip_threshold, then we'll log full-page images of
> the contents at commit time even at wal_level='minimal' after which we
> have no problem. As far as I can see, this bug has "always" existed,
> but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you
> would have needed a different test case. Specifically, you would have
> needed to use COPY to put the row in the table, and you would have
> needed to omit setting wal_skip_threshold since it didn't exist yet.

Agreed.

> I don't presently have a specific idea about how to fix this.

Can't recovery just not delete the directory, create it if doesn't exist, and
be happy if it does exist? Like the attached WIP. If we think it's possible
for a crash during mkdir to leave a directory having the wrong permissions,
adding a chmod would be in order.

Attachment Content-Type Size
XLOG_TBLSPC_CREATE-minimal-v0.patch text/plain 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-08-10 01:24:35 Re: Another regexp performance improvement: skip useless paren-captures
Previous Message Mark Dilger 2021-08-10 01:17:40 Re: Another regexp performance improvement: skip useless paren-captures