Re: Improving vacuum/VM/etc

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving vacuum/VM/etc
Date: 2015-04-24 20:11:09
Message-ID: 553AA35D.4020200@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/24/15 6:50 AM, Robert Haas wrote:

Thanks for looking at this.

> On Thu, Apr 23, 2015 at 3:09 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>> Unfortunately, the states I came up with using existing semantics don't look
>> hugely useful[4], but if we take Robert's idea and make all-visible mean
>> all-frozen, we can do much better:
>>
>> 0: Newly inserted tuples
>> Tracking this state allows us to aggressively set hint bits.
>
> Who is "us"? And what do you mean by "aggressively"? As things
> stand, any process that has to touch a tuple always sets any
> applicable hint bits.

A background process that will proactively hint tuples before a
foreground process needs to. But see also below...

>> 1: Newly deleted
>> There are tuples that have been deleted but not pruned. There may also be
>> newly inserted tuples that need hinting (state 0).
>>
>> Similar to state 0, we'd want to be fairly aggressive with these pages,
>> because as soon as the deleting XID is committed and older than all
>> snapshots we can prune. Because we can prune without hitting indexes, this
>> is still a fairly cheap operation, though not as cheap as 0.
>
> What behavior difference would you foresee between state 0 and state 1?

Below.

>> 2: Fully hinted, not frozen
>> This is the really painful state to clean up, because we have to deal with
>> indexes. We must enter this state after being in 1.
>
> Neither the fact that a page is fully hinted nor the fact that it is
> or is not frozen implies anything about dealing with indexes. We need
> to deal with indexes because the page contains either dead tuples (as
> a result of an aborted insert, a committed delete, or an aborted or
> committed update) or dead line pointers (as a result of pruning dead
> tuples).

The idea I was shooting for is that the worst-case scenario in cleanup
is dealing with indexes, which we need to do any time a tuple becomes
dead. That's why I made 1 a separate state from 0, but it occurs to me
now that I wasn't very clear about this.

My goal here is that there are two separate "paths" for a page to be in:
either it needs index vacuuming at some point, or it doesn't. If a page
is in state 0, once we can make the page all-visible/frozen it can go
into state 3 and *we never have to clean it again*.

OTOH, if a tuple is marked dead (non-HOT), then we can be aggressive
about hinting (and pruning, if there were HOT updates as well), but no
matter what we must eventually include that page in index cleanup.

So once a page enters state 1 or 2, it may never move to state 0 or 3
without an index scan pass.

> OK, I agree that a background process could be useful. Whenever it
> sees a dirty page, it could attempt to aggressively set hint bits,
> prune, mark all-visible, and freeze the page before that page gets
> evicted. However, that doesn't require the sort of state map you're
> proposing here.
>
> I think your statement about "pages that were in those states are
> still likely to be in the disk cache" is not really true. I mean, if
> we're doing OLTP, yes. But not if we're bulk-loading.

Right, but at least we'd know we had a table with a load of unhinted or
newly dead tuples. That means there's cleanup work we can do without
needing an index pass.

>> Not needing to scan everything that's frozen is thanks to state 3. I think
>> it's OK (at least for now) if only vacuum puts pages into this state, which
>> means it can actually freeze the tuples when it does it (thanks to 37484ad
>> we won't lose forensic data doing this). That means there's no extra work
>> necessary by a foreground process that's dirtying a page.
>
> Did you notice the discussion on the other thread about this
> increasing WAL volume by a factor of 113?

Yeah, though I'd forgotten about it. :(

I wonder if there's some way we can reduce that. I just looked at what
we WAL log for a freeze and it appears to only be xl_heap_freeze_tuple,
which if my math is correct is 12 bytes (11 ignoring alignment). I don't
understand how that can be 113 times worse than a plain vacuum.

> I can't really follow why you've got these states to begin with. 0,
> 1, and 2 are all pretty much the same. The useful distinction AFAICS
> is between not-all-visible, all-visible, and all-visible-plus-frozen.

Index scanning is probably the most expensive part of cleanup, so it
seems like it would be useful to be able to track that as
visible/frozen. (What would probably be more useful is a way to directly
link a heap tuple to any index tuples pointing at it, but that would
certainly be a lot harder to do.)

There's also the idea of being proactive about hinting and pruning,
instead of foisting that onto later foreground processes or hoping that
vacuum comes along. Certainly the most obvious part is doing that before
buffers are evicted, but it's not uncommon for the OS cache to be 10x
larger (or more). Even if we can't hit these pages before they're all
the way on disk, if we at least know there's a pile of them we can do
something before a foreground process (or at least let the DBA know).
But if we are keeping the number of tuples in this state more limited
then perhaps a map isn't the right way to track it; I think that will
come down to a question of how that ties in with the long-lived states.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-04-24 20:21:38 Re: Feedback on getting rid of VACUUM FULL
Previous Message Jim Nasby 2015-04-24 20:09:37 Re: Freeze avoidance of very large table.