Support for pg_receivexlog --format=plain|tar

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Support for pg_receivexlog --format=plain|tar
Date: 2016-12-27 01:23:37
Message-ID: CAB7nPqTEVXjtH+fehcCbP791H71cfLN_p9rrd-h=yMJfshZQWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Since 56c7d8d4, pg_basebackup supports tar format when streaming WAL
records. This has been done by introducing a new transparent routine
layer to control the method used to fetch WAL walmethods.c: plain or
tar.

pg_receivexlog does not make use of that yet, but I think that it
could to allow retention of more WAL history within the same amount of
disk space. OK, disk space is cheap but for some users things like
that matters to define a duration retention policy. Especially when
things are automated around Postgres. I really think that
pg_receivexlog should be able to support an option like
--format=plain|tar. "plain" is the default, and matches the current
behavior. This option is of course designed to match pg_basebackup's
one.

So, here is in details what would happen if --format=tar is done:
- When streaming begins, write changes to a tar stream, named
segnum.tar.partial as long as the segment is not completed.
- Once the segment completes, rename it to segnum.tar.
- each individual segment has its own tarball.
- if pg_receivexlog fails to receive changes in the middle of a
segment, it begins streaming back at the beginning of a segment,
considering that the current .partial segment is corrupted. So if
server comes back online, empty the current .partial file and begin
writing on it again. (I have found a bug on HEAD in this area
actually).

Magnus, you have mentioned me as well that you had a couple of ideas
on the matter, feel free to jump in and let's mix our thoughts!

There are a couple of things that I have been considering as well for
pg_receivexlog. Though they are not directly stick to this thread,
here they are as I don't forget about them:
- Removal of oldest WAL segments on a partition. When writing WAL
segments to a dedicated partition, we could have an option that
automatically removes the oldest WAL segment if the partition is full.
This triggers once a segment is completed.
- Compression of fully-written segments. When a segment is finished
being written, pg_receivexlog could compress them further with gz for
example. With --format=t this leads to segnum.tar.gz being generated.
The advantage of doing those two things in pg_receivexlog is
monitoring. One process to handle them all, and there is no need of
cron jobs to handle any cleanup or compression.

Thanks,
--
Michael

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-12-27 01:28:53 Re: BUG: pg_stat_statements query normalization issues with combined queries
Previous Message Tatsuo Ishii 2016-12-27 00:24:18 Re: Questions regarding signal handler of postmaster