Re: Performance regression with PostgreSQL 11 and partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance regression with PostgreSQL 11 and partitioning
Date: 2018-06-08 18:52:40
Message-ID: CA+Tgmoat2s2r0bZJZ5fO8NVriHdZXxgK6GJu5u0FagUJsmgtJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 8, 2018 at 2:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't understand this complaint. Before, the code took one
>> AppendRelInfo, and according to you, it was clear what was supposed to
>> happen. Now it takes an array of AppendRelInfos and, according to
>> you, it's completely unclear. Yet that seems, to me at least, to be a
>> straightforward generalization. If 1 AppendRelInfo is an adequate
>> specification of one translations, why are N AppendRelInfos not an
>> adequate specification of N translations?
>
> Because the relationships between the transforms are unclear. Are we
> supposed to apply those N transformations to the expression in sequence?
> It doesn't look to me like that's what the code does. I think --- I might
> be wrong --- that the code is relying on the transformations to be
> non-overlapping, that is a change made by any one of them cannot be
> further affected by another one. This is, however, undocumented.

OK, I see. I guess that possible confusion didn't occur to me both
because I was looking at the code, which I knew wouldn't handle
overlapping transformations, and also because the intended use was for
partition-wise join, where the problem can't come up because we're
translating from the toplevel RTIs for the join to the set of RTIs
appropriate for one child-join. But, it's certainly fine to add a
comment.

Regarding your original complaint that find_appinfos_by_relids will
"linearly search the appinfo list to produce ... an array, which also
has to be linearly searched", let me just note that there is a big
difference between the probable lengths of those arrays. Given an
N-way partition-wise join between tables with P partitions each, the
entire appinfo list will contain N * P entries, but the entries
produced by find_appinfos_by_relids() will at most of length P. There
is a lot of user interest in having P grow into the thousands, or at
the very least the hundreds, but the planner is probably going to be
in a tough spot anyway if there are more than a few tens of tables
and, in practice, I suspect that the number of compatibly partitioned
tables in a query is not likely to grow even that large. Moreover,
searching an array should be noticeably more cache-efficient than
searching a list. So the overall efficiency of searching the array is
probably 2 to 4 orders of magnitude better than searching the list.

That being said, I don't mind a bit if you want to look for further
speedups here, but if you do, keep in mind that a lot of queries won't
even use partition-wise join, so all of the arrays will be of length
1. Even when partition-wise join is used, it is quite likely not to
be used for every table in the query, in which case it will still be
of length 1 in some cases. So pessimizing nappinfos = 1 even slightly
is probably a regression overall.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Da Silva 2018-06-08 19:05:12 Re: pl/tcl function to detect when a request has been canceled
Previous Message David Rowley 2018-06-08 18:50:19 Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian