Quick Links

Re: Freeze avoidance of very large table.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject:	Re: Freeze avoidance of very large table.
Date:	2015-04-21 20:21:47
Message-ID:	CA+TgmoZdE8Wkt8viirakyQcXvAPYtQ0b=yZUS7Njyn1cEDd92Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Apr 20, 2015 at 7:59 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> http://www.postgresql.org/message-id/CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com
> has a WIP patch that goes the route of using a tuple flag to indicate
> frozen, but also raises a lot of concerns about visibility, because it means
> we'd stop using FrozenXID. That impacts a large amount of code. There were
> some followup patches as well as a bunch of discussion of how to make it
> visible that a tuple was frozen or not. That thread died in January 2014.

Actually, this change has already been made, so it's not so much of a
to-do as a was-done. See commit
37484ad2aacef5ec794f4dd3d5cf814475180a78. The immediate thing we got
out of that change is that when CLUSTER or VACUUM FULL rewrite a
table, they now freeze all of the tuples using this method. See
commits 3cff1879f8d03cb729368722ca823a4bf74c0cac and
af2543e884db06c0beb75010218cd88680203b86. Previously, CLUSTER or
VACUUM FULL would not freeze anything, which meant that people who
tried to use VACUUM FULL to recover from XID wraparound problems got
nowhere, and even people who knew when to use which tool could end up
having to VACUUM FULL and then VACUUM FREEZE afterward, rewriting the
table twice, an annoyance.

It's possible that we could use this infrastructure to freeze more
aggressively in other circumstances. For example, perhaps VACUUM
should freeze any page it intends to mark all-visible. That's not a
guaranteed win, because it might increase WAL volume: setting a page
all-visible does not emit an FPI for that page, but freezing any tuple
on it would, if the page hasn't otherwise been modified since the last
checkpoint. Even if that were no issue, the freezing itself must be
WAL-logged. But if we could somehow get to a place where all-visible
=> frozen, then autovacuum would never need to visit all-visible
pages, a huge win.

We could also attack the problem from the other end. Instead of
trying to set the bits on the individual tuples, we could decide that
whenever a page is marked all-visible, we regard it as frozen
regardless of the bits set or not set on the individual tuples.
Anybody who wants to modify the page must freeze any unfrozen tuples
"for real" before clearing the visibility map bit. This would have
the same end result as the previous idea: all-visible would
essentially imply frozen, and autovacuum could ignore those pages
categorically.

I'm not saying those ideas don't have problems, because they do. But
I think they are worth further exploring. The main reason I gave up
on that is because Heikki was working on the XID-to-LSN mapping stuff.
That seemed like a better approach than either of the above, so as
long as Heikki was working on that, there wasn't much reason to pursue
more lowbrow approaches. Clearly, though, we need to do something
about this. Freezing is a big problem for lots of users.

All that having been said, I don't think adding a new fork is a good
approach. We already have problems pretty commonly where our
customers complain about running out of inodes. Adding another fork
for every table would exacerbate that problem considerably.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Freeze avoidance of very large table. at 2015-04-20 23:59:43 from Jim Nasby

Responses

Re: Freeze avoidance of very large table. at 2015-04-21 20:27:58 from Andres Freund
Re: Freeze avoidance of very large table. at 2015-04-21 23:24:44 from Jim Nasby
Re: Freeze avoidance of very large table. at 2015-04-22 15:09:34 from Kevin Grittner
Re: Freeze avoidance of very large table. at 2015-04-23 08:19:06 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2015-04-21 20:25:02	Re: Reducing spinlock acquisition within clock sweep loop
Previous Message	Stephen Frost	2015-04-21 19:50:07	Re: Row security violation error is misleading