Old row version in hot chain become visible after a freeze

From: "Wood, Dan" <hexpert(at)amazon(dot)com>
To: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Old row version in hot chain become visible after a freeze
Date: 2017-08-31 22:36:43
Message-ID: E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


From: Wood, Dan hexpert(at)amazon(dot)com

I’ve found a bug in Postgres which causes old row versions to appear in a table. DEAD rows in a hot chain are getting frozen and becoming visible. I’ve repro’d this in both 9.6.1 and 11-devel.

The repro consists of two short psql scripts.

While the repro does an explicit VACUUM FREEZE, this bug also happens with autovacuum.

FILE: lock.sql
begin;
select id from t where id=3 for key share;
select pg_sleep(1);
update t set x=x+1 where id=3;
commit;
vacuum freeze t;
select ctid, xmin, xmax, id from t;

FILE: repro.sql
drop table t;
create table t (id int primary key, name char(3), x integer);

insert into t values (1, '111', 0);
insert into t values (3, '333', 0);

\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &

It’s about 50-50 whether any given run of repro.sql will produce output like:

ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,7) | 990 | 0 | 3 | 5
(2 rows)

ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,3) | 986 | 0 | 3 | 1 // This, and x = 2, 3 and 4 came back from the DEAD
(0,4) | 987 | 0 | 3 | 2
(0,5) | 988 | 0 | 3 | 3
(0,6) | 989 | 0 | 3 | 4
(0,7) | 990 | 0 | 3 | 5
(6 rows)

Root cause analysis: lazy_scan_heap() deletes DEAD tuples in heap_page_prune(). However, it is possible for concurrent commits/rollbacks to render a tuple DEAD by the time we reach the switch statement on HeapTupleSatisfiesVacuum(). If such a row IsHotUpdated or IsHeapOnly we can’t delete it below, and must allow a later prune to take care of it.

if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple))
nkeep += 1; // Don't delete, allow later prune to delete it
else
tupgone = true; // We can delete it below

Because tupgone is false we freeze instead of deleting. Freezing a DEAD tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()

* It is assumed that the caller has checked the tuple with
* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
* (else we should be removing the tuple, not freezing it).

It is rare that we run into a DEAD tuple in this way during a freeze. More often RECENTLY_DEAD is returned. But we did see this with a more realistic long running test and I was able to create the simplified test case above. Skipping the Freeze on a DEAD tuple that IsHotUpdated or IsHeapOnly does fix the problem. I’ve attached a patch with this fix.

Attachment Content-Type Size
update_freeze.patch application/octet-stream 2.0 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2017-08-31 22:56:20 Re: Old row version in hot chain become visible after a freeze
Previous Message Sandeep Thakkar 2017-08-31 16:57:14 Re: BUG #14792: Invalid ssleay32.dll