Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Phil Florent <philflorent(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian
Date: 2018-06-20 01:20:21
Message-ID: CAKJS1f_y+kcw7mL7e9Nxf3AOYWdA3wwcYBnFQz0_DHiYbgVaSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 June 2018 at 02:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jun 17, 2018 at 10:59 PM, David Rowley
>> Robert, do you have any objections to the proposed patch?
>
> I don't have time to study this right now, but I think the main
> possible objection is around performance. If not flattening the
> Append is the best way to make queries run fast, then we should do it
> that way. If making pruning capable of coping with mixed hierarchies
> is going to be faster, then we should do that. If I were to speculate
> in the absence of data, my guess would be that failing to flatten the
> hierarchy is going to lead to a significant per-tuple cost, while the
> cost of making run-time pruning smarter is likely to be incurred once
> per rescan (i.e. a lot less). But that might be wrong, and it might
> be impractical to get this working perfectly in v11 given the time we
> have. But I would suggest that you performance test a query that ends
> up feeding lots of tuples through two Append nodes rather than one and
> see how much it hurts.

I've performed two tests. One to see what the overhead of the
additional append is, and one to see what the saving from pruning away
unneeded partitions is. I tried to make the 2nd test use a realistic
number of partitions. Partition pruning will become more useful with
higher numbers of partitions.

Test 1: Test overhead of pulling tuples through an additional append

create table p (a int) partition by list (a);
create table p1 partition of p for values in(1);
insert into p select 1 from generate_series(1,1000000);
vacuum p1;
set max_parallel_workers_per_gather=0;

select count(*) from (select * from p union all select * from p) p;

Unpatched:
tps = 8.530355 (excluding connections establishing)

Patched:
tps = 7.853939 (excluding connections establishing)

Patched version takes 108.61% of the unpatched time.

Test 2: Tests time saved from run-time partition pruning and not
scanning the index on 23 of the partitions.

create table rp (d date) partition by range (d);
select 'CREATE TABLE rp' || x::text || ' PARTITION OF rp FOR VALUES
FROM (''' || '2017-01-01'::date + (x::text || ' month')::interval ||
''') TO (''' || '2017-01-01'::date + ((x+1)::text || '
month')::interval || ''');'
from generate_Series(0,23) x;
\gexec
insert into rp select d::date from
generate_series('2017-01-01','2018-12-31', interval '10 sec') d;
create index on rp (d);

select count(*) from (select * from rp union all select * from rp) rp
where d = current_date;

Unpatched: (set enable_partition_pruning = 0; to make it work)
tps = 260.969953 (excluding connections establishing)

Patched:
tps = 301.319038 (excluding connections establishing)

Patched version takes 86.61% of the unpatched time.

So, I don't think that really concludes much. I'd say the overhead
shown in test 1 is going to be a bit more predictable as it will
depend on how many tuples are being pulled through the additional
Append, but the savings shown in test 2 will vary. Having run-time
pruning not magically fail to work when the partitioned table is part
of a UNION ALL certainly seems less surprising.

If I drop the index from the "d" column in test 2, the performance gap
increases significantly and is roughly proportional to the number of
partitions.

Unpatched:
tps = 0.523691 (excluding connections establishing)

Patched:
tps = 13.453964 (excluding connections establishing)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-06-20 01:53:35 pgsql: Clarify use of temporary tables within partition trees
Previous Message Thomas Munro 2018-06-19 23:06:52 Re: Add necessary package list to ldap TAP's README