Re: Deleting older versions in unique indexes to avoid page splits

From: Victor Yegorov <vyegorov(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-18 11:41:09
Message-ID: CAGnEboiX3dwP4oFrCT6FDwwOb-87Dr+ghJ5vMqqfm9wxfQeDoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

пн, 18 янв. 2021 г. в 07:44, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>:

> The summary of the above is that with Case-1 (clean-up based on hint
> received with the last item on the page) it takes fewer operations to
> cause a page split as compared to Case-2 (clean-up based on hint
> received with at least of the items on the page) for a mixed workload.
> How can we say that it doesn't matter?
>

I cannot understand this, perhaps there's a word missing in the brackets?..

Thinking of your proposal to undertake bottom-up deletion also on the
last-to-fit tuple insertion,
I'd like to start with my understanding of the design:

* we try to avoid index bloat, but we do it in the “lazy” way (for a
reason, see below)

* it means, that if there is enough space on the leaf page to fit new index
tuple,
we just fit it there

* if there's not enough space, we first look at the presence of LP_DEAD
tuples,
and if they do exits, we scan the whole (index) page to re-check all
tuples in order
to find others, not-yet-marked-but-actually-being-LP_DEAD tuples and
clean all those up.

* if still not enough space, only now we try bottom-up-deletion. This is
heavy operation and
can cause extra IO (tests do show this), therefore this operation is
undertaken at the point,
when we can justify extra IO against leaf page split.

* if no space also after bottom-up-deletion, we perform deduplication (if
possible)

* finally, we split the page.

Should we try bottom-up-deletion pass in a situation where we're inserting
the last possible tuple
into the page (enough space for *current* insert, but likely no space for
the next),
then (among others) exists the following possibilities:

- no new tuples ever comes to this page anymore (happy accident), which
means that
we've wasted IO cycles

- we undertake bottom-up-deletion pass without success and we're asked to
insert new tuple in this
fully packed index page. This can unroll to:
- due to the delay, we've managed to find some space either due to
LP_DEAD or bottom-up-cleanup.
which means we've wasted IO cycles on the previous iteration (too early
attempt).
- we still failed to find any space and are forced to split the page.
in this case we've wasted IO cycles twice.

In my view these cases when we generated wasted IO cycles (producing no
benefit) should be avoided.
And this is main reason for current approach.

Again, this is my understanding and I hope I got it right.

--
Victor Yegorov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2021-01-18 11:43:28 Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault
Previous Message Peter Smith 2021-01-18 10:43:52 Re: Single transaction in the tablesync worker?