| From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> | 
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> | 
| Subject: | WAL replay bugs | 
| Date: | 2014-04-07 18:16:40 | 
| Message-ID: | 5342EB88.2050506@vmware.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I've been playing with a little hack that records a before and after 
image of every page modification that is WAL-logged, and writes the 
images to a file along with the LSN of the corresponding WAL record. I 
set up a master-standby replication with that hack in place in both 
servers, and ran the regression suite. Then I compared the after images 
after every WAL record, as written on master, and as replayed by the 
standby.
The idea is that the page content in the standby after replaying a WAL 
record should be identical to the page in the master, when the WAL 
record was generated. There are some known cases where that doesn't 
hold, but it's a useful sanity check. To reduce noise, I've been 
focusing on one access method at a time, filtering out others.
I did that for GIN first, and indeed found a bug in my new 
incomplete-split code, see commit 594bac42. After fixing that, and 
zeroing some padding bytes (38a2b95c), I'm now getting a clean run with 
that.
Next, I took on GiST, and lo-and-behold found a bug there pretty quickly 
as well. This one has been there ever since we got Hot Standby: the redo 
of a page update (e.g an insertion) resets the right-link of the page. 
If there is a concurrent scan, in a hot standby server, that scan might 
still need the rightlink, and will hence miss some tuples. This can be 
reproduced like this:
1. in master, create test table.
CREATE TABLE gisttest (id int4);
CREATE INDEX gisttest_idx ON gisttest USING gist (id);
INSERT INTO gisttest SELECT g * 1000 from generate_series(1, 100000) g;
-- Test function. Starts a scan, fetches one row from it, then waits 10 
seconds until fetching the rest of the rows.
-- Returns the number of rows scanned. Should be 100000 if you follow
-- these test instructions.
CREATE OR REPLACE FUNCTION gisttestfunc() RETURNS int AS
$$
declare
   i int4;
   t text;
   cur CURSOR FOR SELECT 'foo' FROM gisttest WHERE id >= 0;
begin
   set enable_seqscan=off; set enable_bitmapscan=off;
   i = 0;
   OPEN cur;
   FETCH cur INTO t;
perform pg_sleep(10);
   LOOP
     EXIT WHEN NOT FOUND; -- this is bogus on first iteration
	i = i + 1;
     FETCH cur INTO t;
   END LOOP;
   CLOSE cur;
   RETURN i;
END;
$$ LANGUAGE plpgsql;
2. in standby
SELECT gisttestfunc();
<blocks>
3. Quickly, before the scan in standby continues, cause some page splits:
INSERT INTO gisttest SELECT g * 1000+1 from generate_series(1, 100000) g;
4. The scan in standby finishes. It should return 100000, but will 
return a lower number if you hit the bug.
At a quick glance, I think fixing that is just a matter of not resetting 
the right-link. I'll take a closer look tomorrow, but for now I just 
wanted to report what I've been doing. I'll post the scripts I've been 
using later too - nag me if I don't.
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Stephen Frost | 2014-04-07 18:17:42 | Re: B-Tree support function number 3 (strxfrm() optimization) | 
| Previous Message | Robert Haas | 2014-04-07 18:12:24 | Re: B-Tree support function number 3 (strxfrm() optimization) |