Re: making update/delete of inheritance trees scale better

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2021-02-05 21:05:43
Message-ID: CA+TgmobYSpQKo3wJOZM3LUGtOp_OZr4+sB2ehUxMwHp67BDi1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 5, 2021 at 12:06 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> You do realize that we're just copying Datums from one level to the
> next? For pass-by-ref data, the Datums generally all point at the
> same physical data in some disk buffer ... or if they don't, it's
> because the join method had a good reason to want to copy data.

I am older and dumber than I used to be, but I'm amused at the idea
that I might be old enough and dumb enough not to understand this. To
be honest, given that we are just copying the datums, I find it kind
of surprising that it causes us pain, but it clearly does. If you
think it's not an issue, then what of the email from Amit Langote to
which I was responding, or his earlier message at
http://postgr.es/m/CA+HiwqHUkwcy84uFfUA3qVsyU2pgTwxVkJx1uwPQFSHfPz4rsA@mail.gmail.com
which contains benchmark results?

As to why it causes us pain, I don't have a full picture of that.
Target list construction is one problem: we build all these target
lists for intermediate notes during planning and they're long enough
-- if the user has a bunch of columns -- and planning is cheap enough
for some queries that the sheer time to construct the list shows up
noticeably in profiles. I've seen that be a problem even for query
planning problems that involve just one table: a test that takes the
"physical tlist" path can be slower just because the time to construct
the longer tlist is significant and the savings from postponing tuple
deforming isn't. It seems impossible to believe that it can't also
hurt us on join queries that actually make use of a lot of columns, so
that they've all got to be included in tlists at every level of the
join tree. I believe that the execution-time overhead isn't entirely
trivial either. Sure, copying an 8-byte quantity is pretty cheap, but
if you have a lot of columns and you copy them a lot of times for each
of a lot of tuples, it adds up. Queries that do enough "real work"
e.g. calling expensive functions, forcing disk I/O, etc. will make the
effect of a bunch of x[i] = y[j] stuff unnoticeable, but there are
plenty of queries that don't really do anything expensive -- they're
doing simple joins of data that's already in memory. Even there,
accessing buffers figures to be more expensive because it's shared
memory with locking and cache line contention; but I don't think that
means we can completely ignore the performance impact of backend-local
computation. b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755 is a good
example of getting a significant gain by refactoring to reduce
seemingly trivial overheads -- in that case, AIUI, the benefits are
around fewer function calls and better CPU branch prediction.

> If we didn't have the intermediate tuple slots, we'd have to have
> some other scheme for identifying which data to examine in intermediate
> join levels' quals. Maybe you can devise a scheme that has less overhead,
> but it's not immediately obvious that any huge win would be available.

I agree. I'm inclined to suspect that some benefit is possible, but
that might be wrong and it sure doesn't look easy.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-02-05 21:13:20 Re: new heapcheck contrib module
Previous Message Tom Lane 2021-02-05 20:55:20 Re: More test/kerberos tweaks