Re: PATCH: use foreign keys to improve join estimates v1

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: use foreign keys to improve join estimates v1
Date: 2015-09-29 22:18:20
Message-ID: CAKJS1f9cWiBMUFN6f-zPiYKOBjGEWCxe54vuai+LLMn219cZZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 September 2015 at 01:59, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
> CREATE TABLE f (id1 INT, id2 INT, PRIMARY KEY (id1, id2));
>
> CREATE TABLE d1 (id1 INT, id2 INT, FOREIGN KEY (id1, id2) REFERENCES
> f(id1, id2));
> CREATE TABLE d2 (id1 INT, id2 INT, FOREIGN KEY (id1, id2) REFERENCES
> f(id1, id2));
>
> INSERT INTO f SELECT i, i FROM generate_series(1,1000000) s(i);
>
> INSERT INTO d1 SELECT i, i FROM generate_series(1,100000) s(i);
> INSERT INTO d2 SELECT i, i FROM generate_series(1,300000) s(i);
>
> now, both pair-wise joins (f JOIN d1) and (f JOIN d2) are estimated
> perfectly accurately, but as soon as the query involves both of them, this
> happens:
>
> SELECT * FROM f JOIN d1 ON (f.id1 = d1.id1 AND f.id2 = d1.id2)
> JOIN d2 ON (f.id1 = d2.id1 AND f.id2 = d2.id2);
>
>
>
This is a near perfect example of what I was trying to explain about being
unable to have guarantees about there being 1.0 matching tuples in the
referenced relation.

If we run the above with join_collapse_limit = 1, then we'll first join f
to d1, which will give us 100000 tuples. (With IDs ranging from 1 to 100000)

If we now perform estimates to join d2 to (f, d1), we don't have all of the
referenced tuples in (f, d1), so how do we estimate that? Remember that d2
has IDs 100001 to 300000 that won't be found in the "referenced" relation.

What if we had populated d1 with:

INSERT INTO d1 SELECT i, i FROM generate_series(900001,1000000) s(i);

The join will find exactly 0 tuples between the join of (f, d1) -> d2.

Is my logic here wrong somehow?

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-29 22:19:49 Re: Idea for improving buildfarm robustness
Previous Message Joshua Elsasser 2015-09-29 22:16:28 [PATCH 6/6] pg_basebackup: add a single-tar output format.