Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: daveg <daveg(at)sonic(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Date: 2011-03-04 15:52:29
Message-ID: 4D710ABD.5090103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On 04.03.2011 11:00, daveg wrote:
> Thanks, I've applied both patches to one host. I'll probably have to back
> down on the debugging logging soon, as the output is pretty voluminious,
> it is producing 100MB of message log every few minutes. I'll try Merlins
> patch to get the case setting the bit first though.
>
> Anyway, here is a snippit of log with the setting and unsetting of the
> bit on one page. Unfortunately, we don't know which of a dozen odd databases
> the page belongs to, but the timestamps are so close it seems likely to be
> the same one. I've added dbname to the patch and will get that next time
> I can switch binaries.
>
> 2011-03-03 23:28:34.170 PST 2039 WARNING: debugging: setting PD_ALL_VISIBLE in relation "pg_statistic" on page 5963 (OldestXmin 331848998)
> ...
> /cv/logs/production_03-20110303_232519.log.gz:2011-03-03 23:29:34.194 PST 2115 WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic" page 5963 (OldestXmin 331677178)
> ...
> 2011-03-03 23:42:38.323 PST 2808 WARNING: debugging: setting PD_ALL_VISIBLE in relation "pg_attribute" on page 5963 (OldestXmin 331677178)

Hmm, if these all came from the same database, then it looks OldestXmin
has moved backwards. That would explain the warnings. First one vacuum
determines that all the tuples are visible to everyone and sets the
flag. Then another vacuum runs with an older OldestXmin, and determines
that there's a tuple on the page with an xmin that is not yet visible to
everyone, hence it thinks that the flag should not have been set yet.

Looking at the code, I don't see how that situation could arise, though.
The value calculated by GetOldestXmin() should never move backwards. And
GetOldestXmin() is called in lazy_vacuum_rel(), after it has acquired a
lock on the table, which should protect from a race condition where two
vacuums could run on the table one after another, in a way where the
later vacuum runs with an OldestXmin calculated before the first vacuum.

Hmm, fiddling with vacuum_defer_cleanup_age on the fly could cause that,
though. You don't do that, do you?

> Also, I've attached the relevent page image.

Thanks. There seems to be two tuples on the page, both of were HOT
updated at some point, but now there's only one version of each left:

postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax,
t_ctid, t_infomask, t_infomask2, t_hoff
FROM heap_page_items(loread(lo_open(29924, 262144), 8192)) WHERE
lp_flags <> 0;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid |
t_infomask | t_infomask2 | t_hoff
----+--------+----------+--------+-----------+--------+----------+------------+-------------+--------
1 | 7608 | 1 | 580 | 331250141 | 0 | (5963,1) |
10499 | -32747 | 32
3 | 1 | 2 | 0 | | | |
| |
4 | 7528 | 1 | 76 | 331735553 | 0 | (5963,4) |
10497 | -32747 | 32
19 | 4 | 2 | 0 | | | |
| |
(4 rows)

Deciphering those infomasks, the first tuple at lp 1 has these flags set:
HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL |
HEAP_HASVARWIDTH

And the 2nd one at lp 4:
HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASNULL

So, both of those tuples are live.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message John Rouillard 2011-03-04 17:27:43 Re: How to upgrade PostgreSQL minor releases without a dump/restore?
Previous Message Kenneth Marshall 2011-03-04 14:37:46 Re: How to upgrade PostgreSQL minor releases without a dump/restore?

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-04 16:00:22 Re: Quick Extensions Question
Previous Message Joe Conway 2011-03-04 15:50:07 Re: ALTER TABLE deadlock with concurrent INSERT