Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, alvherre(at)2ndquadrant(dot)com, pryzby(at)telsasoft(dot)com, sanyo(dot)moura(at)tatic(dot)net, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, david(dot)rowley(at)2ndquadrant(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
Date: 2019-01-10 01:41:56
Message-ID: 7354b5bf-5932-4c0c-81ee-d9550de18995@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers pgsql-performance

Fujita-san,

On 2019/01/09 20:20, Etsuro Fujita wrote:
> (2019/01/09 9:30), Amit Langote wrote:
>> So, while the patch at [1] can take care of this issue as I also mentioned
>> upthread, I was trying to come up with a solution that can be back-patched
>> to PG 11.  The patch I posted above is one such solution and as Ashutosh
>> points out it's perhaps not the best, because it can result in potentially
>> creating many copies of the same child EC member if we do it in joinrel.c,
>> as the patch proposes.  I will try to respond to the concerns he raised in
>> the next week if possible.
>
> Thanks for working on this!
>
> I like your patch in general.  I think one way to address Ashutosh's
> concerns would be to use the consider_partitionwise_join flag: originally,
> that was introduced for partitioned relations to show that they can be
> partitionwise-joined, but I think that flag could also be used for
> non-partitioned relations to show that they have been set up properly for
> partitionwise-joins, and I think by checking that flag we could avoid
> creating those copies for child dummy rels in try_partitionwise_join.

Ah, that's an interesting idea.

If I understand the original design of it correctly,
consider_partitionwise_join being true for a given relation (simple or
join) means that its RelOptInfo contains properties to consider it to be
joined with another relation (simple or join) using partitionwise join
mechanism. Partitionwise join will occur between the pair if the other
relation also has relevant properties (hence its
consider_partitionwise_join set to true) and properties on the two sides
match.

That's a loaded meaning and abusing it to mean something else can be
challenged, but we can live with that if properly documented. Speaking of
which:

/* used by partitionwise joins: */
bool consider_partitionwise_join; /* consider partitionwise join
* paths? (if partitioned
rel) */

Maybe, mention here how it will be abused in back-branches for
non-partitioned relations?
 
> Please find attached an updated version of the patch.  I modified your
> version so that building tlists for child dummy rels are also postponed
> until after they actually participate in partitionwise-joins, to avoid
> that possibly-useless work as well.  I haven't done any performance tests
> yet though.

Thanks for updating the patch. I tested your patch (test setup described
below) and it has almost the same performance as my previous version:
552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also
mentioned below.

Thanks,
Amit

[1] Test setup

-- create tables
CREATE TABLE precio(fecha timestamp, pluid int, loccd int, plusalesprice
int) PARTITION BY RANGE (fecha);

SELECT format('CREATE TABLE public.precio_%s PARTITION OF public.precio
(PRIMARY KEY (fecha, pluid, loccd) ) FOR VALUES FROM (''%s'')TO(''%s'')',
i, a, b) FROM (SELECT '1990-01-01'::timestamp +(i||'days')::interval a,
'1990-01-02'::timestamp+(i||'days')::interval b, i FROM
generate_series(1,999) i)x;

\gexec

-- query
SELECT l_variacao.fecha, l_variacao.loccd , l_variacao.pant ,
l_variacao.patual , max_variacao.var_max FROM (SELECT p.fecha, p.loccd,
p.plusalesprice patual, da.plusalesprice pant, abs(p.plusalesprice -
da.plusalesprice) as var from precio p, (SELECT p.fecha, p.plusalesprice,
p.loccd from precio p WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2) da WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2 and p.loccd = da.loccd and p.fecha = da.fecha) l_variacao,
(SELECT max(abs(p.plusalesprice - da.plusalesprice)) as var_max from
precio p, (SELECT p.fecha, p.plusalesprice, p.loccd from precio p WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2) da WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2 and p.loccd
= da.loccd and p.fecha = da.fecha) max_variacao WHERE max_variacao.var_max
= l_variacao.var;

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Hogg 2019-01-10 02:49:00 Re: PostgreSQL Read IOPS limit per connection
Previous Message Merlin Moncure 2019-01-09 22:47:53 Re: PostgreSQL Read IOPS limit per connection

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-10 01:43:08 BTW, have we got a commitfest manager for the January CF?
Previous Message Donald Dong 2019-01-10 01:38:53 Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name