From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Sean Rhea <sean(dot)c(dot)rhea(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Merge join vs merge semi join against primary key |
Date: | 2015-10-13 00:43:22 |
Message-ID: | CAKJS1f8_Cf7YBncU-ZoCEdsF3ccEedb0doH4idLZ0DZQaGqkdw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 10 October 2015 at 08:52, Sean Rhea <sean(dot)c(dot)rhea(at)gmail(dot)com> wrote:
>
> 1. When I join two tables with "WHERE id IN (...)" versus with an explicit
> join, and the join column for the inner table is a primary key, I would
> expect
> the same behavior in both cases, but the optimizer is choosing a merge
> join in
> one case and a merge semi join in the other. There's at most one customer
> with a given id. Why not do a semi join?
>
>
Unfortunately the 9.2 planner does not make any checks to verify that
customers.id is unique to perform a semi join. There is a pending patch in
the 9.6 cycle to add this optimisation.
> production=> select count(*) from customers;
> count
> --------
> 473733
> (1 row)
>
> ...
> -> Index Scan using customers_pkey on customers o
> (cost=0.00..63836.61 rows=836 width=8) (actual time=159.840..201915.765
> rows=7672 loops=1)
> Filter: (group_id = 45)
> Rows Removed by Filter: 212699113
>
>
Rows Removed by Filter: 212699113 seems to indicate that your 473733 row
count for "customers" is incorrect.
If you're doing lots of filtering on group_id, then perhaps you should
think about adding an index on customers (group_id,id)
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2015-10-13 00:47:07 | Re: Understanding "seq scans" |
Previous Message | Israel Brewster | 2015-10-12 23:21:54 | Re: Pattern match against array elements? |