Re: [PoC] Reducing planning time when tables have many partitions

From: Yuya Watari <watari(dot)yuya(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PoC] Reducing planning time when tables have many partitions
Date: 2022-07-05 08:57:14
Message-ID: CAJ2pMkZ4Ag+Ogu6RBa5F0sOTFKwnoreGwTNiEBCSetK-G7K1nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Tom,

Thank you for replying to my email.

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not real thrilled with trying to throw hashtables at the problem,
> though. As David noted, they'd be counterproductive for simple
> queries.

As you said, my approach that utilizes hash tables has some overheads,
leading to degradation in query planning time.

I tested the degradation by a brief experiment. In this experiment, I
used a simple query shown below.

===
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
===

Here, students, scores, and gpas tables have no partitions, i.e., they
are regular tables. Therefore, my techniques do not work for this
query and instead may lead to some regression. I repeatedly issued
this query 1 million times and measured their planning times.

The attached figure describes the distribution of the planning times.
The figure indicates that my patch has no severe negative impacts on
the planning performance. However, there seems to be a slight
degradation.

I show the mean and median of planning times below. With my patch, the
planning time became 0.002-0.004 milliseconds slower. We have to deal
with this problem, but reducing time complexity while keeping
degradation to zero is significantly challenging.

Planning time (ms)
| Mean | Median
------------------------------
Master | 0.682 | 0.674
Patched | 0.686 | 0.676
------------------------------
Degradation | 0.004 | 0.002

Of course, the attached result is just an example. Significant
regression might occur in other types of queries.

> For the bms_equal class of lookups, I wonder if we could get anywhere
> by adding an additional List field to every RelOptInfo that chains
> all EquivalenceMembers that match that RelOptInfo's relids.
> The trick here would be to figure out when to build those lists.
> The simple answer would be to do it lazily on-demand, but that
> would mean a separate scan of all the EquivalenceMembers for each
> RelOptInfo; I wonder if there's a way to do better?
>
> Perhaps the bms_is_subset class could be handled in a similar
> way, ie do a one-time pass to make a List of all EquivalenceMembers
> that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

--
Best regards,
Yuya Watari

Attachment Content-Type Size
figure.png image/png 83.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-07-05 09:20:53 Re: [PATCH] Add result_types column to pg_prepared_statements view
Previous Message John Naylor 2022-07-05 08:49:14 Re: [PoC] Improve dead tuple storage for lazy vacuum