Re: [bug fix] pg_rewind takes long time because it mistakenly copies data files

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] pg_rewind takes long time because it mistakenly copies data files
Date: 2018-02-26 07:46:01
Message-ID: 20180226074601.GA1960@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2018 at 06:01:43AM +0000, Tsunakawa, Takayuki wrote:
> The cause was that pg_rewind failed to recognize data files in
> tablespace directories, resulting in the full copy of those files
> instead of WAL replay.

Ouch. Confirmed.

If I test pg_rewind with a tablespace (primary and standby on same
server, base backup taken using --tablespace-mapping), then I bump
immediately into an assertion failure:
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff99aa3f06 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
* frame #0: 0x00007fff99aa3f06 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8a30e4ec libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff9a8d46df libsystem_c.dylib`abort + 129
frame #3: 0x00007fff9a89bdd8 libsystem_c.dylib`__assert_rtn + 321
frame #4: 0x000000010f7680f6 pg_rewind`process_block_change(forknum=MAIN_FORKNUM, rnode=(spcNode = 16385, dbNode = 16384, relNode = 16386), blkno=4424) + 406 at filemap.c:374
frame #5: 0x000000010f760c9f pg_rewind`extractPageInfo(record=0x00007fc742800600) + 399 at parsexlog.c:401
frame #6: 0x000000010f76055e pg_rewind`extractPageMap(datadir="/Users/XXX/data/5432", startpoint=0, tliIndex=0, endpoint=134217768) + 270 at parsexlog.c:97
frame #7: 0x000000010f75f48c pg_rewind`main(argc=5, argv=0x00007fff504a1450) + 1964 at pg_rewind.c:300
frame #8: 0x00007fff88daf5ad libdyld.dylib`start + 1
(lldb) up 4
frame #4: 0x000000010f7680f6 pg_rewind`process_block_change(forknum=MAIN_FORKNUM, rnode=(spcNode = 16385, dbNode = 16384, relNode = 16386), blkno=4424) + 406 at filemap.c:374
371
372 if (entry)
373 {
-> 374 Assert(entry->isrelfile);
375
376 switch (entry->action)
377 {
(lldb) p *entry
(file_entry_t) $0 = {
path = 0x00007fc742424400 "pg_tblspc/16385/PG_11_201802061/16384/16386"
type = FILE_TYPE_REGULAR
action = FILE_ACTION_COPY
oldsize = 0
newsize = 36249600
isrelfile = '\0'
pagemap = (bitmap = <no value available>, bitmapsize = 0)
link_target = 0x00007fc742424430 <no value available>
next = 0x0000000000000000
}

Your patch is able to fix that. I have also checked that after
diverging the promoted server with more data and inserting data on the
old primary then the correct set of blocks from the tablespace is fetched as
well by pg_rewind. This patch also behaves correctly when creating a
new relation on the promoted server as it copies the whole relation. In
short your patch looks good to me.

Creating a test case for this patch would be nice, however this needs a
bit of work so as the tablespace map can be used with pg_basebackup and
PostgresNode.pm (or use raw pg_basebackup commands in pg_rewind tests):
- PostgresNode::init_from_backup needs to be able to understand extra
options given by caller for pg_basebackup.
- RewindTest::create_standby should be extended with extra arguments
given for previous extension.
:(
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-26 08:08:49 Re: [bug fix] Cascaded standby cannot start after a clean shutdown
Previous Message Tsunakawa, Takayuki 2018-02-26 07:25:46 RE: [bug fix] Cascaded standby cannot start after a clean shutdown