Re: Skip partition tuple routing with constant partition key

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Skip partition tuple routing with constant partition key
Date: 2022-07-14 05:30:56
Message-ID: CAApHDvqFeW5hvQqprXOLuGMMJSf+1C+Wk4w_L-M03sVduF3oYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've spent some time looking at the v10 patch, and to be honest, I
don't really like the look of it :(

1. I think we should be putting the cache fields in PartitionDescData
rather than PartitionDispatch. Having them in PartitionDescData allows
caching between statements.
2. The function name maybe_cache_partition_bound_offset() fills me
with dread. It's very unconcise. I don't think anyone should ever use
that word in a function or variable name.
3. I'm not really sure why there's a field named n_tups_inserted.
That would lead me to believe that ExecFindPartition is only executed
for INSERTs. UPDATEs need to know the partition too.
4. The fields you're adding to PartitionDispatch are very poorly
documented. I'm not really sure what n_offset_changed means. Why
can't you just keep track by recording the last used partition, the
last index into the datum array, and then just a count of the number
of times we've found the last used partition in a row? When the found
partition does not match the last partition, just reset the counter
and when the counter reaches the cache threshold, use the cache path.

I've taken a go at rewriting this, from scratch, into what I think it
should look like. I then looked at what I came up with and decided
the logic for finding partitions should all be kept in a single
function. That way there's much less chance of someone forgetting to
update the double-checking logic during cache hits when they update
the logic for finding partitions without the cache.

The 0001 patch is my original attempt. I then rewrote it and came up
with 0002 (applies on top of 0001).

After writing a benchmark script, I noticed that the performance of
0002 was quite a bit worse than 0001. I noticed that the benchmark
where the partition changes each time got much worse with 0002. I can
only assume that's due to the increased code size, so I played around
with likely() and unlikely() to see if I could use those to shift the
code layout around in such a way to make 0002 faster. Surprisingly
using likely() for the cache hit path make it faster. I'd have assumed
it would be unlikely() that would work.

cache_partition_bench.png shows the results. I tested with master @
a5f9f1b88. The "Amit" column is your v10 patch.
copybench.sh is the script I used to run the benchmarks. This tries
all 3 partitioning strategies and performs 2 COPY FROMs, one with the
rows arriving in partition order and another where the next row always
goes into a different partition. I'm expecting to see the "ordered"
case get better for LIST and RANGE partitions and the "unordered" case
not to get any slower.

With all of the attached patches applied, it does seem like I've
managed to slightly speed up all of the unordered cases slightly.
This might be noise, but I did manage to remove some redundant code
that needlessly checked if the HASH partitioned table had a DEFAULT
partition, which it cannot. This may account for some of the increase
in performance.

I do need to stare at the patch a bit more before I'm confident that
it's correct. I just wanted to share it before I go and do that.

David

Attachment Content-Type Size
v11-0001-WIP-Cache-last-used-partition-in-PartitionDesc.patch text/plain 9.5 KB
cache_partition_bench.png image/png 98.1 KB
copybench.sh text/x-sh 2.7 KB
v11-0002-Do-partition-caching-another-way.patch text/plain 15.1 KB
v11-0003-likely.patch text/plain 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-14 05:41:31 Re: [PATCH] Completed unaccent dictionary with many missing characters
Previous Message Pavel Stehule 2022-07-14 04:54:49 Re: proposal: possibility to read dumped table's name from file