Re: Skip partition tuple routing with constant partition key

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Skip partition tuple routing with constant partition key
Date: 2022-07-26 22:27:53
Message-ID: CAApHDvqZ=ncMg=GV=QTrX7WZrG9sQj+Hzg3eZ7EEdtCoaV=aWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank for looking at this.

On Sat, 23 Jul 2022 at 01:23, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> + /*
> + * The Datum has changed. Zero the number of times we've
> + * found last_found_datum_index in a row.
> + */
> + partdesc->last_found_count = 0;
>
> + /* Zero the "winning streak" on the cache hit count */
> + partdesc->last_found_count = 0;
>
> Might it be better for the two comments to say the same thing? Also,
> I wonder which one do you intend as the resetting of last_found_count:
> setting it to 0 or 1? I can see that the stanza at the end of the
> function sets to 1 to start a new cycle.

I think I've addressed all of your comments. The above one in
particular caused me to make some larger changes.

The reason I was zeroing the last_found_count in LIST partitioned
tables when the Datum was not equal to the previous found Datum was
due to the fact that the code at the end of the function was only
checking the partition indexes matched rather than the bound_offset vs
last_found_datum_index. The reason I wanted to zero this was that if
you had a partition FOR VALUES IN(1,2), and you received rows with
values alternating between 1 and 2 then we'd match to the same
partition each time, however the equality test with the current
'values' and the Datum at last_found_datum_index would have been false
each time. If we didn't zero the last_found_count we'd have kept
using the cache path even though the Datum and last Datum wouldn't
have been equal each time. That would have resulted in always doing
the cache check and failing, then doing the binary search anyway.

I've now changed the code so that instead of checking the last found
partition is the same as the last one, I'm now checking if
bound_offset is the same as last_found_datum_index. This will be
false in the "values alternating between 1 and 2" case from above.
This caused me to have to change how the caching works for LIST
partitions with a NULL partition which is receiving NULL values. I've
coded things now to just skip the cache for that case. Finding the
correct LIST partition for a NULL value is cheap and no need to cache
that. I've also moved all the code which updates the cache fields to
the bottom of get_partition_for_tuple(). I'm only expecting to do that
when bound_offset is set by the lookup code in the switch statement.
Any paths, e.g. HASH partitioning lookup and LIST or RANGE with NULL
values shouldn't reach the code which updates the partition fields.
I've added an Assert(bound_offset >= 0) to ensure that stays true.

There's probably a bit more to optimise here too, but not much. I
don't think the partdesc->last_found_part_index = -1; is needed when
we're in the code block that does return boundinfo->default_index;
However, that only might very slightly speedup the case when we're
inserting continuously into the DEFAULT partition. That code path is
also used when we fail to find any matching partition. That's not one
we need to worry about making go faster.

I also ran the benchmarks again and saw that most of the use of
likely() and unlikely() no longer did what I found them to do earlier.
So the weirdness we saw there most likely was just down to random code
layout changes. In this patch, I just dropped the use of either of
those two macros.

David

Attachment Content-Type Size
v12_cache_last_partition.patch text/plain 12.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-07-26 22:31:18 Re: fairywren hung in pg_basebackup tests
Previous Message Tom Lane 2022-07-26 21:32:07 Re: Expand palloc/pg_malloc API