Re: Performance regression with PostgreSQL 11 and partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance regression with PostgreSQL 11 and partitioning
Date: 2018-06-07 20:22:23
Message-ID: 2737.1528402943@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> On Wed, Jun 6, 2018 at 11:27 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> I was trying to be realistic for something we can do to fix v11. It's
>> probably better to minimise the risky surgery on this code while in
>> beta. What I proposed was intended to fix a performance regression new
>> in v11. I'm not sure what you're proposing has the same intentions.

> Agreed that we want a less risky fix at this stage. What I am worried
> is with your implementation there are two ways to get AppendRelInfo
> corresponding to a child, append_rel_list and append_rel_array. Right
> now we will change all the code to use append_rel_array but there is
> no guarantee that some code in future will use append_rel_list. Bugs
> would rise if these two get out of sync esp. considering
> append_rel_list is a list which can be easily modified. I think we
> should avoid that. See what we do to rt_fetch() and
> planner_rt_fetch(), but we still have two ways to get an RTE. That's a
> source of future bugs.

>> I'd prefer to get a committers thoughts on this before doing any further work.

> Yes. I think so too.

As the original author of the append_rel_list stuff, I do have a few
thoughts here.

The reason why append_rel_list is just a list, and not any more
complicated data structure, is alluded to in the comments for
find_childrel_appendrelinfo (which David's patch should have
changed, and did not):

* This search could be eliminated by storing a link in child RelOptInfos,
* but for now it doesn't seem performance-critical. (Also, it might be
* difficult to maintain such a link during mutation of the append_rel_list.)

There are a lot of places in prepjointree.c and planner.c that whack
around the append_rel_list contents more or less extensively. It's
a lot easier in those places if the data is referenced by just one
list pointer. I do not think we want to replace append_rel_list
with something that would make those functions' lives much harder.

However, so far as I can see, the append_rel_list contents don't change
anymore once we enter query_planner(). Therefore, it's safe to build
secondary indexing structure(s) to allow fast access beyond that point.
This is not much different from what we do with, eg, the rangetable
and simple_rte_array[].

So that's basically what David's patch does, and it seems fine as far
as it goes, although I disapprove of shoving the responsibility into
setup_simple_rel_arrays() without so much as a comment change.
I'd make a separate function for that, I think. (BTW, perhaps instead
we should do what the comment quoted above contemplates, and set up a
link in the child's RelOptInfo?)

I'm still of the opinion that find_appinfos_by_relids() needs to be
nuked from orbit. It has nothing to recommend it either from the
standpoint of performance or that of intellectual coherency (or maybe
that problem is just inadequate documentation). The places consuming
its results are no better.

I was also pretty unhappy to discover, as I poked around in the code, that
recent partitioning patches have introduced various assumptions about the
ordering of the append_rel_list. It's bad enough that those exist at all;
it's worse that they're documented, if at all, only beside the code that
will fail (usually silently) if they're violated. I do not find this
acceptable. If we cannot avoid these assumptions, they need to be
documented more centrally, like in the relation.h block comment for struct
AppendRelInfo.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-07 20:57:36 Code of Conduct committee: call for volunteers
Previous Message Andres Freund 2018-06-07 19:48:36 Re: computing completion tag is expensive for pgbench -S -M prepared