Record a Bitmapset of non-pruned partitions

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Record a Bitmapset of non-pruned partitions
Date: 2021-06-30 13:59:28
Message-ID: CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Back when working on 959d00e9d, to allow ordered partition scans for
LIST and RANGE partitioned tables, I mentioned [1] that if we had a
field that recorded a Bitmapset of the non-pruned partitions, we could
use that to do a more thorough check to see if ordered scans are
possible.

At the moment these ordered scans are possible if the order by matches
the partition key and, for LIST partitioned tables we must also ensure
that we don't have something like partition1 FOR VALUES IN(1,3) and
partition2 FOR VALUES(2,4). Here we can't scan partition1 first before
we'd get 3s before the 2s that are in partition2. Since we don't
record the list of partitions that survived pruning, I just made that
check to ensure all partitions only allow a single value to be stored.
That means the optimisation is not done in cases where it's possible
to do it.

To make this work, as I mentioned in [1], we really need a live_parts
field to track which partitions survived pruning. If we have that
then we can ensure that just those partitions have no instances where
a lower value could appear in a later partition.

In the attached patch, I've only added the live_parts field and made
use of it in a few locations where the knowledge is useful as an
optimisation. Right now apply_scanjoin_target_to_paths() appears in
profiles when planning a query to a partitioned table with many
partitions and just 1 or a few survive pruning. The reason for this is
simple; looping over a large almost entirely empty array is just slow
and using the Bitmapset as a sort of index into the interesting
elements of that array speeds it up, quite a bit.

Since we've already put quite a bit of effort into making that fast,
then I think it might be worth adding this field even if it was just
for that purpose.

With 10k empty hash partitions and 1 random one surviving partition
pruning, I see:

Master:
18.13% postgres postgres [.] apply_scanjoin_target_to_paths
3.66% postgres postgres [.] AllocSetAlloc
2.05% postgres postgres [.] hash_search_with_hash_value
1.95% postgres libc-2.33.so [.] __memset_avx2_unaligned_erms
1.88% postgres postgres [.] SearchCatCacheInternal
1.55% postgres postgres [.] base_yyparse
0.74% postgres postgres [.] get_relation_info
0.68% postgres [kernel.kallsyms] [k] __d_lookup_rcu
0.61% postgres postgres [.] palloc

Patched:
3.72% postgres postgres [.] AllocSetAlloc
2.30% postgres postgres [.] hash_search_with_hash_value
2.22% postgres postgres [.] SearchCatCacheInternal
2.02% postgres libc-2.33.so [.] __memset_avx2_unaligned_erms
1.88% postgres postgres [.] base_yyparse
1.08% postgres postgres [.] palloc
0.92% postgres [kernel.kallsyms] [k] __d_lookup_rcu

$ cat setup.sql
create table hp (a int primary key, b int not null) partition by hash(a);
select 'create table hp'||x|| ' partition of hp for values with
(modulus 10000, remainder '||x||');' from generate_series(0,9999) x;
\gexec

$ cat select.sql
\set p random(1,2000000)
select * from hp where a = :p

master

$ pgbench -n -f select.sql -c 1 -j 1 -T 60 postgres
tps = 2608.704218 (without initial connection time)
tps = 2607.641247 (without initial connection time)
tps = 2583.017011 (without initial connection time)

patched

$ pgbench -n -f select.sql -c 1 -j 1 -T 60 postgres
tps = 2715.993495 (without initial connection time)
tps = 2701.527640 (without initial connection time)
tps = 2707.343009 (without initial connection time)

Does anyone have any thoughts about if this is worth a new field in RelOptInfo?

David

[1] https://www.postgresql.org/message-id/CAKJS1f9W7sg1sb3SXiTQUovs%3DwDMrHATXv68F5dSbe5fuHH%2BiQ%40mail.gmail.com

Attachment Content-Type Size
add_RelOptInfo_live_parts_field.patch application/octet-stream 6.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-06-30 14:08:04 Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
Previous Message Alvaro Herrera 2021-06-30 13:46:59 Re: Preventing abort() and exit() calls in libpq