Re: Query plan, nested EXISTS

From: Matt Daw <matt(at)shotgunsoftware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query plan, nested EXISTS
Date: 2012-09-28 22:56:10
Message-ID: CAA2LLOFkL2VepKAUsRUirfgJeT5P=SMUNntTOd194cR_1xooMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom, v9.2.1 looks good!

Aggregate (cost=420808.99..420809.00 rows=1 width=0) (actual
time=147.345..147.345 rows=1 loops=1)
-> Nested Loop Semi Join (cost=0.00..420786.71 rows=8914 width=0)
(actual time=13.847..147.219 rows=894 loops=1)
-> Index Scan using notes_retirement_date_project on notes a
(cost=0.00..67959.22 rows=12535 width=4) (actual time=13.811..71.741
rows=12469 loops=1)
Index Cond: (project_id = 114)
-> Nested Loop Semi Join (cost=0.00..28.14 rows=1 width=4)
(actual time=0.006..0.006 rows=0 loops=12469)
-> Index Scan using note_links_note on note_links b
(cost=0.00..12.37 rows=1 width=8) (actual time=0.002..0.002 rows=1
loops=12469)
Index Cond: (note_id = a.id)
Filter: ((retirement_date IS NULL) AND
((entity_type)::text = 'Version'::text))
Rows Removed by Filter: 1
-> Index Scan using versions_pkey on versions c
(cost=0.00..15.76 rows=1 width=4) (actual time=0.003..0.003 rows=0
loops=11794)
Index Cond: (id = b.entity_id)
Filter: ((retirement_date IS NULL) AND ((code)::text
~~* '%comp%'::text))
Rows Removed by Filter: 1
Total runtime: 147.411 ms
(14 rows)

On Fri, Sep 28, 2012 at 2:47 PM, Matt Daw <matt(at)shotgunsoftware(dot)com> wrote:

> Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance
> and report back.
>
> Matt
>
>
> On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Matt Daw <matt(at)shotgunsoftware(dot)com> writes:
>> > Howdy, I've been debugging a client's slow query today and I'm curious
>> > about the query plan. It's picking a plan that hashes lots of rows from
>> the
>> > versions table (on v9.0.10)...
>>
>> > EXPLAIN ANALYZE
>> > SELECT COUNT(*) FROM notes a WHERE
>> > a.project_id = 114 AND
>> > EXISTS (
>> > SELECT 1 FROM note_links b
>> > WHERE
>> > b.note_id = a.id AND
>> > b.entity_type = 'Version' AND
>> > EXISTS (
>> > SELECT 1 FROM versions c
>> > WHERE
>> > c.id = b.entity_id AND
>> > c.code ILIKE '%comp%' AND
>> > c.retirement_date IS NULL
>> > ) AND
>> > b.retirement_date IS NULL
>> > )
>>
>> I think the real problem here is that 9.0 is incapable of avoiding a
>> full table scan on "note_links", which means it doesn't really have any
>> better option than to do the inner EXISTS as a full-table semijoin.
>> This is because it can't push a.id down through two levels of join, and
>> because the semijoins don't commute, there's no way to get a.id into the
>> scan of note_links to pull out only the useful rows. The hack with
>> LIMIT avoids this problem by preventing the inner EXISTS from being
>> treated as a full-fledged semijoin; but of course that hack leaves you
>> vulnerable to very bad plans if the statistics are such that a nestloop
>> join isn't the best bet for the inner EXISTS.
>>
>> The work I did for parameterized paths in 9.2 was intended to address
>> exactly this type of scenario. I would be interested to know if 9.2
>> does this any better for you.
>>
>> regards, tom lane
>>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Виктор Егоров 2012-09-28 23:11:58 NestedLoops over BitmapScan question
Previous Message Tom Lane 2012-09-28 22:25:57 Re: Possible Performance Regression with Transitive Comparisons vs. Constants