Re: Proposal: Partitioning Advisor for PostgreSQL

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Yuzuko Hosoya <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Partitioning Advisor for PostgreSQL
Date: 2018-06-12 10:28:15
Message-ID: CAOBaU_bwG6hX9xOZ3n7xq43ztEwZR4bS78aFj12vfLDL7rPffw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jun 12, 2018 at 11:14 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>
> wrote:
>>
>> This is
>> why we are working on partitioning advisor. We plan to release the first
>> version
>> of partitioning advisor for PostgreSQL 11, and then, improve it for
>> PostgreSQL 12.
>>
>
> Interesting.

Thanks!

>> - Estimating stats
>> It is complicated because hypothetical partition has no data. Currently,
>> we compute
>> hypothetical partition's size using clauselist_selectivity() according to
>> their partition
>> bound and original table stats. As a result, estimate is done with low
>> accuracy,
>> especially if there is WHERE clause. We will improve during developing,
>> but for now,
>> we don't have good ideas.
>
>
> I haven't yet read the patch but curious to know. Suppose we have table
> which is already loaded with some data. Now, if I create hypothetical
> partitions on that will we create any stat data (mcv, histogram) for
> hypothetical table? because, in this case we already have the data from the
> main table and we also have partition boundary for the hypothetical table.
> I am not sure you are already doing this or its an open item?

For now we're simply using the original table statistics, and
appending the partition bounds as qual on the hypothetical partition.
It'll give good result if the query doesn't have quals for the table,
or for simple cases where selectivity functions understand that
expressions such as

(id BETWEEN 1 AND 1000000) AND (id < 6)

will return only 5 rows, while they can't for expressions like

(id IN (x,y...)) AND (id < z)

In this second case, the estimates are for now therefore quite wrong.
I think that we'd have no other choice than to generate hypothetical
statistics according to the partition bounds, and only compute
selectivity based on the query quals. It's definitely not simple to
do, but it should be doable with the hooks currently available.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-06-12 10:28:38 Re: BUG #15237: I got "ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"
Previous Message Dilip Kumar 2018-06-12 09:14:02 Re: Proposal: Partitioning Advisor for PostgreSQL