Re: refactoring basebackup.c

From: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
To: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: refactoring basebackup.c
Date: 2022-01-19 16:30:42
Message-ID: 6c3f1558-1e56-9946-78a2-c59340da1dbf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/18/22 8:12 PM, Jeevan Ladhe wrote:
> Similar to LZ4 server-side compression, I have also tried to add a ZSTD
> server-side compression in the attached patch.
Thanks Jeevan. while testing found one scenario where the server is
getting crash while performing pg_basebackup
against server-compression=zstd for a huge data second time

Steps to reproduce
--PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure
--with-lz4,--with-zstd, make/install, initdb, start server)
--insert huge data (./pgbench -i -s 2000 postgres)
--restart the server (./pg_ctl -D data restart)
--pg_basebackup ( ./pg_basebackup  -t server:/tmp/yc1
--server-compression=zstd -R  -Xnone -n -N -l 'ccc' --no-estimate-size -v)
--insert huge data (./pgbench -i -s 1000 postgres)
--restart the server (./pg_ctl -D data restart)
--run pg_basebackup again (./pg_basebackup  -t server:/tmp/yc11
--server-compression=zstd -v  -Xnone )

[edb(at)centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/yc11
--server-compression=zstd -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:23:26.508 IST [30219] LOG:  checkpoint starting: force wait
2022-01-19 21:23:26.608 IST [30219] LOG:  checkpoint complete: wrote 0
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001
s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=16369 kB, estimate=16369 kB
pg_basebackup: checkpoint completed
TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File:
"../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226)
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(ExceptionalCondition+0x7a)[0x94ceca]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b9a08]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b9be2]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x7b5b30]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(SendBaseBackup+0x563)[0x7b7053]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(exec_replication_command+0x961)[0x7c9a41]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(PostgresMain+0x92f)[0x81ca3f]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x48e430]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(PostmasterMain+0xfd2)[0x785702]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"(main+0x1c6)[0x48fb96]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555]
postgres: walsender edb [local] sending backup "pg_basebackup base
backup"[0x48feb5]
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
2022-01-19 21:25:34.485 IST [30205] LOG:  server process (PID 30226) was
terminated by signal 6: Aborted
2022-01-19 21:25:34.485 IST [30205] DETAIL:  Failed process was running:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,  MANIFEST
'yes',  TABLESPACE_MAP,  TARGET 'server', TARGET_DETAIL '/tmp/yc11', 
COMPRESSION 'zstd')
2022-01-19 21:25:34.485 IST [30205] LOG:  terminating any other active
server processes
[edb(at)centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all
server processes terminated; reinitializing
2022-01-19 21:25:34.536 IST [30228] LOG:  database system was
interrupted; last known up at 2022-01-19 21:23:26 IST
2022-01-19 21:25:34.669 IST [30228] LOG:  database system was not
properly shut down; automatic recovery in progress
2022-01-19 21:25:34.671 IST [30228] LOG:  redo starts at 9/7000028
2022-01-19 21:25:34.671 IST [30228] LOG:  invalid record length at
9/7000148: wanted 24, got 0
2022-01-19 21:25:34.671 IST [30228] LOG:  redo done at 9/7000110 system
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-01-19 21:25:34.673 IST [30229] LOG:  checkpoint starting:
end-of-recovery immediate wait
2022-01-19 21:25:34.713 IST [30229] LOG:  checkpoint complete: wrote 3
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003
s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s,
average=0.001 s; distance=0 kB, estimate=0 kB
2022-01-19 21:25:34.718 IST [30205] LOG:  database system is ready to
accept connections

Observation -

if we change server-compression method to lz4 from zstd then it is NOT
happening.

[edb(at)centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/ycc1
--server-compression=lz4 -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:27:51.642 IST [30229] LOG:  checkpoint starting: force wait
2022-01-19 21:27:51.687 IST [30229] LOG:  checkpoint complete: wrote 0
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001
s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=16383 kB, estimate=16383 kB
pg_basebackup: checkpoint completed

NOTICE:  WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
[edb(at)centos7tushar bin]$

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-19 16:38:44 Re: Remove redundant MemoryContextSwith in BeginCopyFrom
Previous Message Tom Lane 2022-01-19 16:21:15 Re: Replace uses of deprecated Python module distutils.sysconfig