Re: Idea for getting rid of VACUUM FREEZE on cold pages

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date: 2010-05-22 01:33:24
Message-ID: 408F4E07-BD8A-44EF-8DE7-96FFC058864F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On May 21, 2010, at 23:57 , Josh Berkus wrote:
> From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it:
>
> Problem: currently, if your database has a large amount of "cold" data, such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer needs to touch it thanks to the visibility map. However, every freeze_age transactions, very old pages need to be sucked into memory and rewritten just in order to freeze those pages. This can have a huge impact on system performance, and seems unjustified because the pages are not actually being used.
>
> Suggested resolution: we would add a 4-byte field to the *page* header which would track the XID wraparound count. Any page whose wraparound count was not equal to the current one would be considered to have all frozen tuples. This would remove the necessity to read and write old pages just to freeze them, a humongous gain for databases with long data retention horizons, let alone data warehouses.

If I understand this correctly, VACUUM usually only frees old tuples, but never increases the oldest xid in the pg_class record. Once that value becomes older than freeze_age, VACUUM needs to scan the whole relation to freeze old tuples. That results in most of the pages being marked dirty and subsequently being written out, causing an IO storm. If, OTOH, the wraparound count was stored in the page header, VACUUM would still need to read those pages, but wouldn't need to write them out.

Alternatively, VACUUM could freeze a few pages on each run, even if the xids are below freeze_age. It could pick those pages randomly, or maybe even prefer pages whose tuples have older xmin/xmas values. That would spread the load out more evenly, much like we try to spread checkpoints out over the whole checkpoint interval.

best regards,
Florian Pflugi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-05-22 02:11:34 beta testing - pg_upgrade bug fix - double free
Previous Message Pavel Stehule 2010-05-22 00:48:24 beta testing - planner bug - ERROR: XX000: failed to build any 2-way joins