Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-22 23:57:27
Message-ID: CAH2-Wz=2X=0eJkmfn1DTN6TMYbQOP10GCXCXV7U7=98akyfp5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 21, 2021 at 9:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Slowing down non-HOT updaters in these extreme cases may actually be a
> > good thing, even when bottom-up deletion finally becomes ineffective.
> > It can be thought of as backpressure. I am not worried about slowing
> > down something that is already hopelessly inefficient and
> > unsustainable. I'd go even further than that, in fact -- I now wonder
> > if we should *deliberately* slow them down some more!
> >
>
> Do you have something specific in mind for this?

Maybe something a bit like the VACUUM cost delay stuff could be
applied at the point that we realize that a given bottom-up deletion
pass is entirely effective purely due to a long running transaction,
that gets applied by nbtree caller once it splits the page.

This isn't something that I plan to work on anytime soon. My point was
mostly that it really would make sense to deliberately throttle
non-hot updates at the point that they trigger page splits that are
believed to be more or less caused by a long running transaction.
They're so incredibly harmful to the general responsiveness of the
system that having a last line of defense like that
(backpressure/throttling) really does make sense.

> I have briefly tried that but numbers were not consistent probably
> because at that time autovacuum was also 'on'. So, I tried switching
> off autovacuum and dropping/recreating the tables.

It's not at all surprising that they weren't consistent. Clearly
bottom-up deletion wastes cycles on the first execution (it is wasted
effort in at least one sense) -- you showed that already. Subsequent
executions will actually manage to delete some tuples (probably a
great many tuples), and so will have totally different performance
profiles/characteristics. Isn't that obvious?

> > Besides, this test case is just ludicrous.
> >
>
> I think it might be okay to say that in such cases we can expect
> regression especially because we see benefits in many other cases so
> paying some cost in such cases is acceptable or such scenarios are
> less common or probably such cases are already not efficient so it
> doesn't matter much but I am not sure if we can say they are
> completely unreasonable. I think this test case depicts the behavior
> with bulk updates.

Sincere question: What do you want me to do about it?

You're asking me about two separate though very closely related issues
at the same time here (this bulk update regression thing, plus the
question of doing bottom-up passes when the incoming item isn't from a
HOT updater). While your positions on these closely related issues are
not incompatible, exactly, it's difficult for me to get to any
underlying substantive point. In effect, you are pulling the patch in
two separate directions at the same time. In practical terms, it's
very likely that I cannot move the bottom-up deletion code closer to
one of your ideals without simultaneously moving it further away from
the other ideal.

I will say the following about your bulk update example now, just in
case you feel that I gave you the impression of never having taken it
seriously:

1. I accept that the effect that you've described is real. It is a
pretty narrow effect in practice, though, and will be of minimal
concern to real applications (especially relative to the benefits they
receive).

I happen to believe that the kind of bulk update that you showed is
naturally very rare, and will naturally cause horrible problems with
any RDBMS -- and that's why I'm not too worried (about this specific
sequence that you showed, or one like it that somehow artfully avoids
receiving any performance benefit). I just cannot buy the idea that
any real world user will do a bulk update like that exactly once, and
be frustrated by the fact that it's ~12% slower in Postgres 14. If
they do it more than once the story changes, of course (the technique
starts to win). You have to do something more than once to notice a
change in its performance in the first place, of course -- so it just
doesn't seem plausible to me. Of course it's still possible to imagine
a way that that could happen. This is a risk that I am willing to live
with, given the benefits.

2. If there was some simple way of avoiding the relative loss of
performance without hurting other cases I would certainly take it --
in general I prefer to not have to rely on anybody's opinion of what
is or is not a reasonable price to pay for something.

I strongly doubt that I can do anything about your first/bulk update
complaint (without causing much greater harm elsewhere), and so I
won't be pursuing it. I did not deny the existence of cases like this
at any point. In fact, the entire discussion was ~50% me agonizing
over regressions (though never this precise case). Discussion of
possible regressions happened over many months and many dense emails.
So I refuse to be lectured about my supposed indifference to
regressions -- not by you, and not by anyone else.

In general I consistently *bend over backwards* to avoid regressions,
and never assume that I didn't miss something. This came up recently,
in fact:

https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

See the "Updates" section of this recent blog post. No regressions
could be attributed to any of the nbtree projects I was involved with
in the past few years. There was a tiny (IMV quite acceptable)
regression attributed to insert-driven autovacuum in Postgres 13.
Deduplication didn't lead to any appreciable loss of performance, even
though most of the benchmarks were rather unsympathetic towards it (no
low cardinality data).

I specifically asked Mark Callaghan to isolate the small regression
that he thought might be attributable to the deduplication feature
(via deduplicate_items storage param) -- even though at the time I
myself imagined that that would *confirm* his original suspicion that
nbtree deduplication was behind the issue. I don't claim to be
objective when it comes to my own work, but I have at least been very
conscientious.

> I am not denying that I could be missing your point but OTOH you are
> also completely refuting the points raised even though I have shown
> them by test and by sharing an example.

Actually, I quite specifically and directly said that I was
*sympathetic* to your second point (the one about doing bottom-up
deletion in more marginal cases not directly involving non-HOT
updaters). I also said that I struggled with it myself for a long
time. I just don't think that it is worth pursuing at this time -- but
that shouldn't stop anyone else that's interested in it.

> 'Note' in the code (or README) suggesting something like we have
> considered the alternative for page-level-flag (to be aggressive in
> triggering this optimization) but not pursued with that for so-and-so
> reasons.

Good news, then: I pretty much *did* document it in the nbtree README!

The following aside concerns this *exact* theoretical limitation, and
appears in parenthesis as part of commentary on the wider issue of how
bottom-up passes are triggered:

"(in theory a small amount of version churn could
make a page split occur earlier than strictly necessary, but that's pretty
harmless)"

The "breadcrumbs" *are* there. You'd notice that if you actually
looked for them.

> I think this can help future developers to carefully think
> about it even if they want to attempt something like that. You have
> considered it during the early development phase and then the same
> thing occurred to Victor and me as an interesting optimization to
> explore so the same can occur to someone else as well.

I would not document the idea in the README unless perhaps I had high
confidence that it would work out. I have an open mind about that as a
possibility, but that alone doesn't seem like a good enough reason to
do it. Though I am now prepared to say that this does not seem like an
amazing opportunity to make the feature much better. That's why it's
not a priority for me right now. There actually *are* things that I
would describe that way (Sawada-san's complementary work on VACUUM).
And so that's what I'll be focussing on in the weeks ahead.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-01-23 00:10:14 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Previous Message Finnerty, Jim 2021-01-22 23:46:13 Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)?