From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats |
Date: | 2017-04-06 01:05:24 |
Message-ID: | CAKJS1f_gB=gyZn8wMw0v8uCKD1nYeWyNYCXKz=+Oo0yR4RRyiA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On 6 April 2017 at 11:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 6 April 2017 at 10:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The buildfarm is unhappy about the fact that this changed the API
>>> for clauselist_selectivity(). I am not convinced that that change
>>> was a good idea, so before telling FDW authors that they need to
>>> change their code, I'd like to hear a defense of the API change.
>
>> Because varReliId is often passed in as 0, and that meant we'd have to
>> write some code to check of the clause was made up of RestrictInfos
>> from a single relation or not, and look for extended stats on that
>> singleton rel.
>
> Generally, if it's passed as zero, that's a good clue that the clause
> *is* a join clause. In any case, this defense fails to address my
> other question, which is what's going to happen to this API when you
> want to use extended stats in join-clause estimates, which I'd expect
> to surely happen before very long.
>
> Also, I find it hard to believe that a bms_get_singleton_member call is
> going to be material in comparison to all the work that will be invoked
> indirectly via whatever selectivity estimation function gets called for
> each clause. Even a single catcache fetch would swamp that.
>
> So no, you have not convinced me that this isn't a broken design.
>
>> FWIW, I found this function being called 72 times in a 5 way join
>> search problem.
>
> And you measured the overhead of doing it the other way to be ... ?
> Premature optimization and all that.
I tested with the attached, and it does not seem to hurt planner
performance executing:
explain select * from ab ab1
inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b
inner join ab ab3 on ab1.a = ab3.a and ab1.b = ab3.b
inner join ab ab4 on ab1.a = ab4.a and ab1.b = ab4.b
inner join ab ab5 on ab1.a = ab5.a and ab1.b = ab5.b
inner join ab ab6 on ab1.a = ab6.a and ab1.b = ab6.b
inner join ab ab7 on ab1.a = ab7.a and ab1.b = ab7.b
inner join ab ab8 on ab1.a = ab8.a and ab1.b = ab8.b;
after having executed:
create table ab (a int, b int);
I get:
find_relation_from_clauses
tps = 48.992918 (excluding connections establishing)
tps = 49.060407 (excluding connections establishing)
tps = 49.075815 (excluding connections establishing)
Master
tps = 48.938027 (excluding connections establishing)
tps = 48.066274 (excluding connections establishing)
tps = 48.727089 (excluding connections establishing)
running pgbench -n -T 60 -f 8wayjoin.sql
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
find_relation_from_clauses.patch | application/octet-stream | 13.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2017-04-06 01:10:48 | Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats |
Previous Message | Tom Lane | 2017-04-05 23:33:03 | Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-04-06 01:07:11 | Re: Partition-wise join for join between (declaratively) partitioned tables |
Previous Message | Andres Freund | 2017-04-06 01:02:42 | Re: Re: new set of psql patches for loading (saving) data from (to) text, binary files |