Re: Ordered Partitioned Table Scans

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2018-10-26 03:52:27
Message-ID: 6a5caa48-b6be-0c8a-4cdf-ef0fe3342da7@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/10/26 11:50, David Rowley wrote:
> RANGE partitioning of time-series data is quite a common range to use
> partitioning, and such tables tend to grow fairly large. I thought
> since we always store RANGE partitioned tables in the PartitionDesc in
> ascending range order that it might be useful to make use of this and
> when the required pathkeys match the order of the range, then we could
> make use of an Append node instead of uselessly using a MergeAppend,
> since the MergeAppend will just exhaust each subplan one at a time, in
> order.
>
> It does not seem very hard to implement this and it does not add much
> in the way of additional processing to the planner.
>
> Performance wise it seems to give a good boost to getting sorted
> results from a partitioned table. I performed a quick test just on my
> laptop with:
>
> Setup:
> CREATE TABLE partbench (id BIGINT NOT NULL, i1 INT NOT NULL, i2 INT
> NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL) PARTITION
> BY RANGE (id);
> select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
> FOR VALUES FROM (' || (x*100000)::text || ') TO (' ||
> ((x+1)*100000)::text || ');' from generate_Series(0,299) x;
> \gexec
> \o
> INSERT INTO partbench SELECT x,1,2,3,4,5 from generate_Series(0,29999999) x;
> create index on partbench (id);
> vacuum analyze;
>
> Test:
> select * from partbench order by id limit 1 offset 29999999;
>
> Results Patched:
>
> Time: 4234.807 ms (00:04.235)
> Time: 4237.928 ms (00:04.238)
> Time: 4241.289 ms (00:04.241)
> Time: 4234.030 ms (00:04.234)
> Time: 4244.197 ms (00:04.244)
> Time: 4266.000 ms (00:04.266)
>
> Unpatched:
>
> Time: 5917.288 ms (00:05.917)
> Time: 5937.775 ms (00:05.938)
> Time: 5911.146 ms (00:05.911)
> Time: 5906.881 ms (00:05.907)
> Time: 5918.309 ms (00:05.918)
>
> (about 39% faster)
>
> The implementation is fairly simple. One thing I don't like about is
> I'd rather build_partition_pathkeys() performed all the checks to know
> if the partition should support a natural pathkey, but as of now, I
> have the calling code ensuring that there are no sub-partitioned
> tables. These could cause tuples to be output in the wrong order.
>
> Does this idea seem like something we'd want?

Definitely! Thanks for creating the patch.

I recall Ronan Dunklau and Julien Rouhaud had proposed a patch for this
last year, but the partitioning-related planning code hadn't advanced then
as much as it has today, so they sort of postponed working on it.
Eventually their patch was returned with feedback last November. Here's
the link to their email in case you wanted to read some comments their
proposal and patch got, although some of them might be obsolete.

https://www.postgresql.org/message-id/2401607.SfZhPQhbS4%40ronan_laptop

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2018-10-26 04:25:09 Re: PostgreSQL vs SQL/XML Standards
Previous Message Chapman Flack 2018-10-26 03:40:37 Re: PostgreSQL vs SQL/XML Standards