Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data
Date: 2016-04-01 13:30:12
Message-ID: CAMsr+YH2-1MzAGHQeGB1HKDKtRVHvxDmeE_UHfYv6HKZJVZSDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 March 2016 at 16:13, Andres Freund <andres(at)anarazel(dot)de> wrote:

> It's probably easier to just generate a humongous commit record. You can
> do so by having a *lot* of subtransactions. Relatively easy to do with
> plpgsql by creating them in a loop (SELECT txid_current() in EXCEPTION
> bearing block ought to suffice).
>
>
This does the trick and does it quickly on 9.4:

CREATE TABLE trivial(padding text);
ALTER TABLE trivial ALTER COLUMN padding SET STORAGE plain;

DO
LANGUAGE plpgsql
$$
DECLARE
wal_seg_size integer;
remaining_size integer;
BEGIN
wal_seg_size := (select setting from pg_settings where name =
'wal_segment_size')::integer
* (select setting from pg_settings where name =
'wal_block_size')::integer;
LOOP
SELECT wal_seg_size - file_offset FROM
pg_xlogfile_name_offset(pg_current_xlog_insert_location()) INTO
remaining_size;
IF remaining_size < 8192
THEN
-- Make a nice big commit record right at the end of the segment
EXIT;
ELSE
BEGIN
-- About 4k
INSERT INTO trivial(padding) VALUES (repeat('0123456789abcdef',
256));
EXCEPTION
WHEN division_by_zero THEN
CONTINUE;
END;
END IF;
END LOOP;
END;
$$;

I had no issue reproducing the bug on 9.4, but I don't see it in 9.6. At
least, not this one, not yet.

Still, we might want to backpatch the fix; it seems clear enough even if I
can't reproduce the issue yet. It doesn't look like it'll affect logical
decoding since the snapshot builder has its own unrelated logic for finding
the startpoint for decoding, and it certainly doesn't affect WAL replay
otherwise we would've seen the fireworks much earlier. The upside is that
also makes the backpatch much safer.

I was surprised to see that there are no tests for pg_xlogdump to add this
on to, so I've added a trivial test. I've got some more complete
xlogdump-helper code in the failover slots tests that I should extract and
add to PostgresNode.pm but this'll do in the mean time and is much simpler.

In the process I noticed that pg_xlogdump doesn't follow the rules of the
rest of the codebase when it comes to command line handling with --version,
returning nonzero on bad options, etc. It might be good to fix that;
there's a small BC impact, but I doubt anyone cares when it comes to
pg_xlogdump.

I'll attach the new testcase once I either get it to reproduce this bug or
give up and leave the basic xlogdump testcase alone.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shulgin, Oleksandr 2016-04-01 13:36:08 Re: SSL indicator in psql prompt
Previous Message Teodor Sigaev 2016-04-01 13:22:55 Re: [PATCH] Phrase search ported to 9.6