Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-11-12 22:00:07
Message-ID: CAH2-Wz=Ypc1PDdk8OVJDChGJBjT06=A0Mbv9HyTLCsOknGcUFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 11, 2020 at 6:17 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> I've looked at the latest (v7) patchset.
> I've decided to use a quite common (in my practice) setup with an indexed mtime column over scale 1000 set:

Thanks for testing!

> We can see that:
> - unused index is not suffering from not-HOT updates at all, which is the point of the patch
> - we have ordinary queries performing on the same level as on master
> - we have 5,2% slowdown in UPDATE speed

I think that I made a mistake with v7: I changed the way that we
detect low cardinality data during bottom-up deletion, which made us
do extra/early deduplication in more cases than we really should. I
suspect that this partially explains the slowdown in UPDATE latency
that you reported. I will fix this in v8.

I don't think that the solution is to go back to the v6 behavior in
this area, though. I now believe that this whole "proactive
deduplication for low cardinality data" thing only made sense as a way
of compensating for deficiencies in earlier versions of the patch.
Deficiencies that I've since fixed anyway. The best solution now is to
simplify. We can have generic criteria for "should we dedup the page
early after bottom-up deletion finishes without freeing up very much
space?". This seemed to work well during my latest testing. Probably
because heapam.c is now smart about the requirements from nbtree, as
well as the cost of accessing heap pages.

> I'm not sure if this should be counted as regression, though, as graphs go on par pretty much.
> Still, I would like to understand why this combination of indexes and queries slows down UPDATEs.

Another thing that I'll probably add to v8: Prefetching. This is
probably necessary just so I can have parity with the existing
heapam.c function that the new code is based on,
heap_compute_xid_horizon_for_tuples(). That will probably help here,
too.

> During compilation I got one warning for make -C contrib:

Oops.

> I agree with the rename to "bottom-up index deletion", using "vacuuming" generally makes users think
> that functionality is used only during VACUUM (misleading).

Yeah. That's kind of a problem already, because sometimes we use the
word VACUUM when talking about the long established LP_DEAD deletion
stuff. But I see that as a problem to be fixed. Actually, I would like
to fix it very soon.

> I haven't looked at the code yet.

It would be helpful if you could take a look at the nbtree patch --
particularly the changes related to deprecating the page-level
BTP_HAS_GARBAGE flag. I would like to break those parts out into a
separate patch, and commit it in the next week or two. It's just
refactoring, really. (This commit can also make nbtree only use the
word VACUUM for things that strictly involve VACUUM. For example,
it'll rename _bt_vacuum_one_page() to _bt_delete_or_dedup_one_page().)

We almost don't care about the flag already, so there is almost no
behavioral change from deprecated BTP_HAS_GARBAGE in this way.

Indexes that use deduplication already don't rely on BTP_HAS_GARBAGE
being set ever since deduplication was added to Postgres 13 (the
deduplication code doesn't want to deal with LP_DEAD bits, and cannot
trust that no LP_DEAD bits can be set just because BTP_HAS_GARBAGE
isn't set in the special area). Trusting the BTP_HAS_GARBAGE flag can
cause us to miss out on deleting items with their LP_DEAD bits set --
we're better off "assuming that BTP_HAS_GARBAGE is always set", and
finding out if there really are LP_DEAD bits set for ourselves each
time.

Missing LP_DEAD bits like this can happen when VACUUM unsets the
page-level flag without actually deleting the items at the same time,
which is expected when the items became garbage (and actually had
their LP_DEAD bits set) after VACUUM began, but before VACUUM reached
the leaf pages. That's really wasteful, and doesn't actually have any
upside -- we're scanning all of the line pointers only when we're
about to split (or dedup) the same page anyway, so the extra work is
practically free.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2020-11-12 22:12:42 Re: Support for NSS as a libpq TLS backend
Previous Message Andrew Dunstan 2020-11-12 21:21:18 Re: Allow matching whole DN from a client certificate