tar-related code in PostgreSQL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: tar-related code in PostgreSQL
Date: 2020-04-24 16:06:26
Message-ID: CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It has come to my attention that PostgreSQL has a bunch of code to
read and write 'tar' archives and it's kind of a mess. Attached are
two patches. The second one was written first, and does some modest
cleanups, most notably replacing the use of the constants 512 and the
formula (x + 511) & ~511 - x in lotsa places with a new constant
TAR_BLOCK_SIZE and a new static line function
tarPaddingBytesRequired(). In developing that patch, I found a bug, so
the first patch (which was written second) fixes it. The problem is
here:

if (state.is_recovery_guc_supported)
{
tarCreateHeader(header, "standby.signal", NULL,
0, /* zero-length file */
pg_file_create_mode, 04000, 02000,
time(NULL));

writeTarData(&state, header, sizeof(header));
writeTarData(&state, zerobuf, 511);
}

We have similar code in many places -- because evidently nobody
thought it would be a good idea to have all the logic for reading and
writing tarfiles in a centralized location rather than having many
copies of it -- and typically it's written to pad the block out to a
multiple of 512 bytes. But here, the file is 0 bytes long, and then we
add 511 zero bytes. This results in a tarfile whose length is not a
multiple of the TAR block size:

[rhaas ~]$ pg_basebackup -D pgbackup -Ft && ls -l pgbackup
total 80288
-rw------- 1 rhaas staff 135255 Apr 24 11:04 backup_manifest
-rw------- 1 rhaas staff 24183296 Apr 24 11:04 base.tar
-rw------- 1 rhaas staff 16778752 Apr 24 11:04 pg_wal.tar
[rhaas ~]$ rm -rf pgbackup
[rhaas ~]$ pg_basebackup -D pgbackup -Ft -R && ls -l pgbackup
total 80288
-rw------- 1 rhaas staff 135255 Apr 24 11:04 backup_manifest
-rw------- 1 rhaas staff 24184319 Apr 24 11:04 base.tar
-rw------- 1 rhaas staff 16778752 Apr 24 11:04 pg_wal.tar
[rhaas ~]$ perl -e 'print $_ % 512, "\n" for qw(24183296 24184319 16778752);'
0
511
0

That seems bad. At first, I thought maybe the problem was the fact
that we were adding 511 zero bytes here instead of 512, but then I
realized the real problem is that we shouldn't be adding any zero
bytes at all. A zero-byte file is already padded out to a multiple of
512, because 512 * 0 = 0. The problem happens not to have any adverse
consequences in this case because this is the last thing that gets put
into the tar file, and the end-of-tar-file marker is 1024 zero bytes,
so the extra 511 zero bytes we're adding here get interpreted as the
beginning of the end-of-file marker, and the first 513 bytes of what
we intended as the actual end of file marker get interpreted as the
rest of it. Then there are 511 bytes of garbage zeros at the end but
my version of tar, at least, does not care.

However, it's possible to make it blow up pretty good with a slightly
different test case, because there's one case in master where we
insert one extra file -- backup_manifest -- into the tar file AFTER we
insert standby.signal. That case is when we're writing the output to
stdout:

[rhaas ~]$ pg_basebackup -D - -Ft -Xnone -R > everything.tar
NOTICE: WAL archiving is not enabled; you must ensure that all
required WAL segments are copied through other means to complete the
backup
[rhaas ~]$ tar tf everything.tar | grep manifest
tar: Damaged tar archive
tar: Retrying...
tar: Damaged tar archive
tar: Retrying...
(it repeats this ~100 times and then exits)

If I change the offending writeTarData(&state, zerobuf, 511) to
writeTarData(&state, zerobuf, 512), then the complaint about a damaged
archive goes away, but the backup_manifest file doesn't appear to be
included in the archive, because apparently one spurious 512-byte
block of zeroes is enough to make my version of tar think it's hit the
end of the archive:

[rhaas ~]$ tar tf everything.tar | grep manifest
[rhaas ~]$

If I remove the offending writeTarData(&state, zerobuf, 511) line
entirely - which I believe to the correct fix - then it works as
expected:

[rhaas ~]$ tar tf everything.tar | grep manifest
backup_manifest

This problem appears to have been introduced by commit
2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85, "Integrate recovery.conf
into postgresql.conf". The code has been substantially modified twice
since then, but those modifications seem to have just preserved the
bug first introduced in that commit.

I tentatively propose to do the following:

1. Commit 0001, which removes the 511 bytes of bogus padding and thus
fixes the bug, to master in the near future.

2. Possibly back-patch 0001 to v12, where the bug first appeared. I'm
not sure this is strictly necessary, because in v12, standby.signal is
always the very last thing in the tarfile, so there shouldn't be an
issue unless somebody has a version of tar that cares about the 511
bytes of trailing garbage. I will do this if people think it's a good
idea; otherwise I'll skip it.

3. Commit 0002, the aforementioned cleanup patch, to master, either
immediately if people are OK with that, or else after we branch. I am
inclined to regard the widespread use of the arbitrary constants 512
and 511 as something of a hazard that ought to be corrected sooner
rather than later, but someone could not-unreasonably take the view
that it's unnecessary tinkering post-feature freeze.

Long term, it might be wiser to either switch to using a real
archiving library rather than a bunch of hand-rolled code, or at the
very least centralize things better so that it's not so easy to make
mistakes of this type. However, I don't see that as a reasonable
argument against either of these patches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0002-Assorted-cleanup-of-tar-related-code.patch application/octet-stream 16.4 KB
0001-Fix-bogus-tar-file-padding-logic-for-standby.signal.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2020-04-24 16:27:41 PostgreSQL 13 Beta 1 Release: 2020-05-21
Previous Message Alvaro Herrera 2020-04-24 15:48:46 Re: 2pc leaks fds