incremental backup breakage in BlockRefTableEntryGetBlocks

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: incremental backup breakage in BlockRefTableEntryGetBlocks
Date: 2024-04-04 17:38:09
Message-ID: CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Yesterday, Tomas Vondra reported to me off-list that he was seeing
what appeared to be data corruption after taking and restoring an
incremental backup. Overnight, Jakub Wartak further experimented with
Tomas's test case, did some initial analysis, and made it very easy to
reproduce. I spent this morning tracking down the problem, for which I
attach a patch.

It doesn't take a whole lot to hit this bug. I think all you need is a
relation that is more than 1GB in size, plus a modification to the
second half of some 1GB file between the full backup and the
incremental backup, plus some way of realizing after you do a restore
that you've gotten the wrong answer. It's obviously quite embarrassing
that this wasn't caught sooner, and to be honest I'm not sure why we
didn't. I think the test cases pass because we avoid committing test
cases that create large relations, but the failure to catch it during
manual testing must be because I never worked hard enough to verify
that the results were fully correct. Ouch.

The actual bug is here:

if (chunkno == stop_chunkno - 1)
stop_offset = stop_blkno % BLOCKS_PER_CHUNK;

Each chunk covers 64k block numbers. The caller specifies a range of
block numbers of interest via start_blkno (inclusive) and stop_blkno
(non-inclusive). We need to translate those start and stop values for
the overall function call into start and stop values for each
particular chunk. When we're on any chunk but the last one of
interest, the stop offset is BLOCKS_PER_CHUNK i.e. we care about
blocks all the way through the end of the chunk. The existing code
handles that fine. If stop_blkno is somewhere in the middle of the
last chunk, the existing code also handles that fine. But the code is
wrong when stop_blkno is a multiple of BLOCKS_PER_CHUNK, because it
then calculates a stop_offset of 0 (which is never right, because that
would mean that we thought the chunk was relevant when it isn't)
rather than BLOCKS_PER_CHUNK. So this forgets about 64k * BLCKSZ =
512MB of potentially modified blocks whenever the size of a single
relation file is exactly 512MB or exactly 1GB, which our users would
probably find more than slightly distressing.

I'm not posting Jakub's reproducer here because it was sent to me
privately and, although I presume he would have no issue with me
posting it, I can't actually confirm that right at the moment.
Hopefully he'll reply with it, and double-check that this does indeed
fix the issue. My thanks to both Tomas and Jakub.

Regards,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Fix-incorrect-calculation-in-BlockRefTableEntryGe.patch application/octet-stream 1.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-04-04 17:45:04 Re: LogwrtResult contended spinlock
Previous Message Ranier Vilela 2024-04-04 17:31:59 Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)