Re: Incorrect XLogRegisterBuffer flag for revmapbuf in brin

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect XLogRegisterBuffer flag for revmapbuf in brin
Date: 2017-01-09 21:03:51
Message-ID: 20170109210351.nsjjkmffddkewfac@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Kuntal Ghosh wrote:
> > Hi all,
> >
> > In brin_doupdate(line 290), REGBUF_STANDARD is used to register
> > revmap buffer reference in WAL record. But, revmap buffer page doesn't
> > have a standard page layout and it doesn't update pd_upper and
> > pd_lower as well.
>
> Hmm. This bug should be causing WAL replay to zero out the revmap page
> contents, since essentially the whole page is covered by the "hole" in
> standard pages. I can't see what is causing that not to happen, but
> evidently it isn't, since the index works in a replica. What am I
> missing?

Ah, I figured it out, and I can reproduce that this bug loses the BRIN
data, effectively corrupting the index -- but AFAICS user queries would
not return corrupted results, because since all the revmap entries
become zeroes, BRIN interprets this as "the range is not summarized" and
so all ranges become lossy for the bitmap scan.

Also, the bug is very low probability. You need to cause an UPDATE xlog
record, which only occurs when a range gets a new index tuple that
doesn't fit in the existing index page (so the index tuple is wider than
the original -- something that doesn't happen with fixed-width
datatypes). And you need to have a backup block for the revmap page --
IOW that revmap page update needs to be the first after a checkpoint
(and not be running full_page_writes=off).

If you examine the revmap in a replica after running the script below,
you'd observe it's different than the one in the master. I confirmed
that the proposed patch fixes the problem.

DROP TABLE IF EXISTS brin_iso;
CREATE TABLE brin_iso (value text) WITH (fillfactor = 10);
CREATE INDEX brinidx ON brin_iso USING brin (value) WITH (pages_per_range=1);
DO $$
DECLARE curtid tid;
BEGIN
LOOP
INSERT INTO brin_iso VALUES ('a');
PERFORM brin_summarize_new_values('brinidx');
SELECT max(pages) INTO curtid FROM brin_revmap_data(get_raw_page('brinidx', 1))
WHERE pages <> '(0,0)';
EXIT WHEN curtid > tid '(3, 0)';
END LOOP;
END;
$$ ;
DELETE FROM brin_iso WHERE ctid < '(0,99)';
VACUUM brin_iso ;
CHECKPOINT;
INSERT INTO brin_iso VALUES (repeat('xyzxxz', 24));

(I'm not sure if it's possible that revmap page 1 can be filled before
the first regular page is full; if that happens, this will loop
forever.)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-01-09 21:12:07 Re: Increase pltcl test coverage
Previous Message Peter Geoghegan 2017-01-09 20:45:55 Re: ICU integration