Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-09-14 16:33:17
Message-ID: CAH2-Wzk6om7AWdtLX-afLGeCwrQwMrYxAdMTqGNBZ-ha0bjF1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 14, 2022 at 3:18 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> On Wed, Sep 14, 2022 at 12:53 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > This is still only scratching the surface of what is possible with
> > dead_items. The visibility map snapshot concept can enable a far more
> > sophisticated approach to resource management in vacuumlazy.c.

> I don't quite see how it helps "enable" that.

I have already written a simple throwaway patch that can use the
current VM snapshot data structure (which is just a local copy of the
VM's pages) to do a cheap precheck ahead of actually doing a binary
search in dead_items -- if a TID's heap page is all-visible or
all-frozen (depending on the type of VACUUM) then we're 100%
guaranteed to not visit it, and so it's 100% guaranteed to not have
any dead_items (actually it could have LP_DEAD items by the time the
index scan happens, but they won't be in our dead_items array in any
case). Since we're working off of an immutable source, this
optimization is simple to implement already. Very simple.

I haven't even bothered to benchmark this throwaway patch (I literally
wrote it in 5 minutes to show Masahiko what I meant). I can't see why
even that throwaway prototype wouldn't significantly improve
performance, though. After all, the VM snapshot data structure is far
denser than dead_items, and the largest tables often have most heap
pages skipped via the VM.

I'm not really interested in pursuing this simple approach because it
conflicts with Masahiko's work on the data structure, and there are
other good reasons to expect that to help. Plus I'm already very busy
with what I have here.

> It'd be more logical to
> me to say the VM snapshot *requires* you to think harder about
> resource management, since a palloc'd snapshot should surely be
> counted as part of the configured memory cap that admins control.

That's clearly true -- it creates a new problem for resource
management that will need to be solved. But that doesn't mean that it
can't ultimately make resource management better and easier.

Remember, we don't randomly visit some skippable pages for no good
reason in the patch, since the SKIP_PAGES_THRESHOLD stuff is
completely gone. The VM snapshot isn't just a data structure that
vacuumlazy.c uses as it sees fit -- it's actually more like a set of
instructions on which pages to scan, that vacuumlazy.c *must* follow.
There is no way that vacuumlazy.c can accidentally pick up a few extra
dead_items here and there due to concurrent activity that unsets VM
pages. We don't need to leave that to chance -- it is locked in from
the start.

> I do remember your foreshadowing in the radix tree thread a while
> back, and I do think it's an intriguing idea to combine pages-to-scan
> and dead TIDs in the same data structure. The devil is in the details,
> of course. It's worth looking into.

Of course.

> Looking at the count of index scans, it's pretty much always
> "1", so even if the current approach could scale above 1GB, "no" it
> wouldn't help to raise that limit.

I agree that multiple index scans are rare. But I also think that
they're disproportionately involved in really problematic cases for
VACUUM. That said, I agree that simply making lookups to dead_items as
fast as possible is the single most important way to improve VACUUM by
improving dead_items.

> Furthermore, it doesn't have to anticipate the maximum size, so there
> is no up front calculation assuming max-tuples-per-page, so it
> automatically uses less memory for less demanding tables.

The final number of TIDs doesn't seem like the most interesting
information that VM snapshots could provide us when it comes to
building the dead_items TID data structure -- the *distribution* of
TIDs across heap pages seems much more interesting. The "shape" can be
known ahead of time, at least to some degree. It can help with
compression, which will reduce cache misses.

Andres made remarks about memory usage with sparse dead TID patterns
at this point on the "Improve dead tuple storage for lazy vacuum"
thread:

https://postgr.es/m/20210710025543.37sizjvgybemkdus@alap3.anarazel.de

I haven't studied the radix tree stuff in great detail, so I am
uncertain of how much the VM snapshot concept could help, and where
exactly it would help. I'm just saying that it seems promising,
especially as a way of addressing concerns like this.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2022-09-14 16:50:33 Re: Allow logical replication to copy tables in binary format
Previous Message Aleksander Alekseev 2022-09-14 16:12:24 Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD