Re: partition routing layering in nodeModifyTable.c

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: partition routing layering in nodeModifyTable.c
Date: 2020-10-23 09:37:41
Message-ID: CA+HiwqH+5EYSTZzOQ8FXhySYzQgUb5efqeuh0EeyugAe2=132A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 22/10/2020 16:49, Amit Langote wrote:
> > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >>> It's probably true that there's no performance gain from initializing
> >>> them more lazily. But the reasoning and logic around the initialization
> >>> is complicated. After tracing through various path through the code, I'm
> >>> convinced enough that it's correct, or at least these patches didn't
> >>> break it, but I still think some sort of lazy initialization on first
> >>> use would make it more readable. Or perhaps there's some other
> >>> refactoring we could do.
> >>
> >> So the other patch I have mentioned is about lazy initialization of
> >> the ResultRelInfo itself, not the individual fields, but maybe with
> >> enough refactoring we can get the latter too.
> >
> > So, I tried implementing a lazy-initialization-on-first-access
> > approach for both the ResultRelInfos themselves and some of the
> > individual fields of ResultRelInfo that don't need to be set right
> > away. You can see the end result in the attached 0003 patch. This
> > slims down ExecInitModifyTable() significantly, both in terms of code
> > footprint and the amount of work that it does.
>
> Have you done any performance testing? I'd like to know how much of a
> difference this makes in practice.

I have shown some numbers here:

https://www.postgresql.org/message-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com

To reiterate, if you apply the following patch:

> Does this patch become moot if we do the "Overhaul UPDATE/DELETE
> processing"?
> (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)?

...and run this benchmark with plan_cache_mode = force_generic_plan

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 1000000)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

you may see roughly the following results:

HEAD:

0 tps = 13045.485121 (excluding connections establishing)
10 tps = 9358.157433 (excluding connections establishing)
100 tps = 1878.274500 (excluding connections establishing)
1000 tps = 84.684695 (excluding connections establishing)

Patched (overhaul update/delete processing):

0 tps = 12743.487196 (excluding connections establishing)
10 tps = 12644.240748 (excluding connections establishing)
100 tps = 4158.123345 (excluding connections establishing)
1000 tps = 391.248067 (excluding connections establishing)

And if you apply the patch being discussed here, TPS shoots up a bit,
especially for higher partition counts:

Patched (lazy-ResultRelInfo-initialization)

0 tps = 13419.283168 (excluding connections establishing)
10 tps = 12588.016095 (excluding connections establishing)
100 tps = 8560.824225 (excluding connections establishing)
1000 tps = 1926.553901 (excluding connections establishing)

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner. The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email. To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

> Another alternative is to continue to create the ResultRelInfos in
> ExecInitModify(), but initialize the individual fields in them lazily.

If you consider the above, maybe you can see how that will not really
eliminate the bottleneck I'm aiming to fix here.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-10-23 09:54:26 Re: speed up unicode decomposition and recomposition
Previous Message Pavel Stehule 2020-10-23 09:23:04 Re: Change JOIN tutorial to focus more on explicit joins