Quick Links

Force the old transactions logs cleanup even if checkpoint is skipped

From:	"Zakhlystov, Daniil (Nebius)" <usernamedt(at)nebius(dot)com>
To:	"amborodin(at)acm(dot)org" <amborodin(at)acm(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	"Mokrushin Mikhail (Nebius)" <rodrijjke(at)nebius(dot)com>
Subject:	Force the old transactions logs cleanup even if checkpoint is skipped
Date:	2023-10-17 14:09:21
Message-ID:	AM9P190MB12346310F38B3FAF9287D1FFB5D6A@AM9P190MB1234.EURP190.PROD.OUTLOOK.COM
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, hackers!

I've stumbled into an interesting problem. Currently, if Postgres has nothing to write, it would skip the checkpoint creation defined by the checkpoint timeout setting. However, we might face a temporary archiving problem (for example, some network issues) that might lead to a pile of wal files stuck in pg_wal. After this temporary issue has gone, we would still be unable to archive them since we effectively skip the checkpoint because we have nothing to write.

That might lead to a problem - suppose you've run out of disk space because of the temporary failure of the archiver. After this temporary failure has gone, Postgres would be unable to recover from it automatically and will require human attention to initiate a CHECKPOINT call.

I suggest changing this behavior by trying to clean up the old WAL even if we skip the main checkpoint routine. I've attached the patch that does exactly that.

What do you think?

To reproduce the issue, you might repeat the following steps:

1. Init Postgres:
pg_ctl initdb -D /Users/usernamedt/test_archiver

2. Add the archiver script to simulate failure:
➜ ~ cat /Users/usernamedt/command.sh
#!/bin/bash

false

3. Then alter the PostgreSQL conf:

archive_mode = on
checkpoint_timeout = 30s
archive_command = /Users/usernamedt/command.sh
log_min_messages = debug1

4. Then start Postgres:
/usr/local/pgsql/bin/pg_ctl -D /Users/usernamedt/test_archiver -l logfile start

5. Insert some data:
pgbench -i -s 30 -d postgres

6. Trigger checkpoint to flush all data:
psql -c "checkpoint;"

7. Alter the archiver script to simulate the end of archiver issues:
➜ ~ cat /Users/usernamedt/command.sh
#!/bin/bash

true

8. Check that the WAL files are actually archived but not removed:
➜ ~ ls -lha /Users/usernamedt/test_archiver/pg_wal/archive_status | head
total 0
drwx------@ 48 usernamedt LD\Domain Users 1.5K Oct 17 17:44 .
drwx------@ 50 usernamedt LD\Domain Users 1.6K Oct 17 17:43 ..
-rw-------@ 1 usernamedt LD\Domain Users 0B Oct 17 17:42 000000010000000000000040.done
...
-rw-------@ 1 usernamedt LD\Domain Users 0B Oct 17 17:43 00000001000000000000006D.done

2023-10-17 18:03:44.621 +04 [71737] DEBUG: checkpoint skipped because system is idle

Thanks,

Daniil Zakhlystov

Attachment	Content-Type	Size
0001-Cleanup-old-files-if-checkpoint-is-skipped.patch	application/octet-stream	1.3 KB

Responses

Re: Force the old transactions logs cleanup even if checkpoint is skipped at 2023-11-02 12:25:20 from Shlok Kyal
Re: Force the old transactions logs cleanup even if checkpoint is skipped at 2023-11-08 00:21:32 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2023-10-17 14:23:22	Re: run pgindent on a regular basis / scripted manner
Previous Message	Robert Haas	2023-10-17 14:03:54	Re: run pgindent on a regular basis / scripted manner