Re: tricking EXPLAIN?

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Wim Bertels <wim(dot)bertels(at)khleuven(dot)be>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: tricking EXPLAIN?
Date: 2011-11-30 04:32:00
Message-ID: 4ED5B1C0.5080705@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

(2011/11/28 20:55), Wim Bertels wrote:
> If we look at the output of EXPLAIN ANALYZE,
> then according to the COST the second query is best one,
> but according to the ACTUAL TIME the first query is best
> (which seems logical intuitively).
>
> So explain is being tricked,
> and the reason for this seems the number of rows in de nested loop,
> which are reduced to 1 for explain because of the join.
> http://www.postgresql.org/docs/8.4/static/using-explain.html
>
> Suggestions, comments are always welcome.

Interesting. I tried a modified version of second query, and got same
EXPLAIN output as first query.

SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON true; -- semantically same as "amproc = amproc"

So, I think that the point of this issue is somehow PG thinks wrongly
that "amporc = amproc" filters the result to just one row, though such
condition never reduces result. I also tried simplified query, and got
another result which shows that PG estimates that same condition reduces
to half.

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..67.52 rows=126 width=18) (actual
time=0.039..1.356 rows=252 loops=1)
Filter: (amproc = amproc)
Total runtime: 1.445 ms
(3 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..4.52 rows=252 width=18) (actual
time=0.008..0.045 rows=252 loops=1)
Total runtime: 0.089 ms
(2 rows)

IMHO planner should be modified so that it can estimate result rows
accurately in this case.

--
Shigeru Hanada

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Florian Schwendener 2011-11-30 08:09:59 Re: odbc_fdw
Previous Message panam 2011-11-30 04:00:04 Re: Extending the volume size of the data directory volume