Re: PATCH: use foreign keys to improve join estimates v1

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: use foreign keys to improve join estimates v1
Date: 2015-09-27 12:00:03
Message-ID: CAKJS1f-Ad5Fs9Fsay_=LZT_Ep81r+qU3pVLcecz+a1+BJMVp0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26 September 2015 at 01:57, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> Hi,
>
> On 09/25/2015 03:39 AM, David Rowley wrote:
>
>> I looked at this again, and I'm not all that sure it's possible to
>>
> assume that 1.0 / <tuples> is valid when there's more than one
>> relation at either side of the join.
>>
> >
>
>> My reasoning for this is that the whole basis for the patch is that a
>> if we find a foreign key match, then we can be sure enough, as far as
>> row estimations go, that exactly 1 tuple will exist matching that
>> condition. This assumption of the 1 tuple no longer holds when 2
>> relations have already been joined, as this can either cause tuple
>> duplication, or elimination.
>>
>
> I don't see why that would be the case. Of course, you need to take the
> right <tuples>, i.e. the "target" of the foreign key (the table with UNIQUE
> constraint) so that the selectivity matches the fact that exactly 1 tuple
> (on the PK side) matches.
>

hmm, ok. You're right. It appears I was a bit confused, but thanks for
explaining it again. I get it now.

I've been working on this again. I've put back the code that you wrote for
the looping over each combination of relations from either side of the join.

I've also added some code to get around the problem with eclass joins and
the RestrictInfo having some alternative Vars that don't belong to the
foreign key. Basically I'm just checking if the RestrictInfo has a
parent_ec, and if it does just loop over the members to try and find the
Vars that belong to the foreign key. I've tested it with the following, and
it seems to work:

create table a as select i as a_id1, i as a_id2, i as dummy1 from
generate_series(0,999) s(i);
alter table a add unique (a_id1, a_id2);
create table b as select i as b_id1, i as b_id2 from generate_series(0,332)
s(i);

analyze a;
analyze b;

alter table b add foreign key (b_id1, b_id2) references a (a_id1, a_id2);

explain analyze select * from a inner join b on a.dummy1 = b.b_id1 and
a.a_id2 = b.b_id2 where a.a_id1 = a.dummy1;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Hash Join (cost=18.57..26.41 rows=2 width=20) (actual time=0.775..1.046
rows=333 loops=1)
Hash Cond: ((b.b_id1 = a.dummy1) AND (b.b_id2 = a.a_id2))
-> Seq Scan on b (cost=0.00..5.33 rows=333 width=8) (actual
time=0.013..0.046 rows=333 loops=1)
-> Hash (cost=18.50..18.50 rows=5 width=12) (actual time=0.737..0.737
rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Seq Scan on a (cost=0.00..18.50 rows=5 width=12) (actual
time=0.014..0.389 rows=1000 loops=1)
Filter: (dummy1 = a_id1)

The non-patched version estimates 1 row. The patched estimates 2 rows, but
that's due to the bad estimate on dummy1 = a_id1.

The 2 comes from ceil(5 * 0.333).

Perhaps you have a better test case to for this?

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
estimation-with-fkeys-v2_davidv3.patch application/octet-stream 20.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-27 14:50:38 Re: pg_dump LOCK TABLE ONLY question
Previous Message Filip Rembiałkowski 2015-09-27 10:43:14 pg_dump LOCK TABLE ONLY question