Re: Access to postgresql query optimizer output

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Access to postgresql query optimizer output
Date: 2012-10-30 02:10:09
Message-ID: 508F3701.8000508@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/30/2012 12:33 AM, Peter Geoghegan wrote:
> On 29 October 2012 16:00, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> The re-writing does not occur at the level of SQL, but rather at the
>> level of an internal representation "parse tree".
>
> Right. The post parse-analysis tree (i.e. the Query struct) is
> potentially rewritten by an explicit rewrite stage after
> parse-analysis but immediately before planning. No rewriting of plans
> actually occurs. This is part of the rules subsystem, which is
> generally considered to be a foot-gun. It *is* mostly just as if the
> query had been silently rewritten, and had its SQL magically changed.
>
> Craig mentioned that two distinct queries can produce the same query
> plan. That's true, if a little academic. I guess the best example of
> that is with join syntax. Technically, these could be two distinct
> queries, in that the query trees would be substantively different
> (according to pg_stat_statements own definition of that, which is
> essentially that they're not bitwise identical in respect of their
> nodes' substantive fields):
>
> select * from foo f join bar b on f.bid = b.bid;
>
> select * from foo f, bar b where f.bid = b.bid;
>
> However, after planning, they could indeed have identical plans.

I'm talking about more substantial transformations too. For example,
given sample data:

CREATE TABLE a ( id integer primary key, somedata text not null );
INSERT INTO a(id, somedata) VALUES
(1,'ham'),(2,'eggs'),(3,'spam'),(4,'putty');
CREATE TABLE b ( a_id integer not null references a(id));
INSERT INTO b VALUES (1),(3);

regress=> EXPLAIN SELECT somedata FROM a
WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id);
QUERY PLAN
-----------------------------------------------------------------------
Hash Join (cost=44.50..73.41 rows=615 width=32)
Hash Cond: (a.id = b.a_id)
-> Seq Scan on a (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=42.00..42.00 rows=200 width=4)
-> HashAggregate (cost=40.00..42.00 rows=200 width=4)
-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

regress=> EXPLAIN SELECT somedata FROM a
WHERE id IN (SELECT a_id FROM b);
QUERY PLAN
-----------------------------------------------------------------------
Hash Join (cost=44.50..73.41 rows=615 width=32)
Hash Cond: (a.id = b.a_id)
-> Seq Scan on a (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=42.00..42.00 rows=200 width=4)
-> HashAggregate (cost=40.00..42.00 rows=200 width=4)
-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

or:

ALTER TABLE b ADD UNIQUE(a_id);

regress=> EXPLAIN SELECT somedata FROM a INNER JOIN b ON (a.id = b.a_id);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop (cost=0.00..17.58 rows=2 width=32)
-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4)
-> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36)
Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop (cost=0.00..17.58 rows=2 width=32)
-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4)
-> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36)
Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b
WHERE b.a_id = a.id);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop (cost=0.00..17.58 rows=2 width=32)
-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4)
-> Index Scan using a_pkey on a (cost=0.00..8.27 rows=1 width=36)
Index Cond: (id = b.a_id)
(4 rows)

These are three very different ways to write the query, and they all
result in the same plan.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2012-10-30 04:10:56 Re: Too much clients connected to the PostgreSQL Database
Previous Message Xiong He 2012-10-30 01:42:34 Re: Too much clients connected to the PostgreSQL Database