On Mon, Feb 25, 2008 at 11:56 AM, Matthew <matthew(at)flymine(dot)org> wrote:
> On Mon, 25 Feb 2008, Kynn Jones wrote:
> > This is just GREAT!!! It fits the problem to a tee.
> It makes the queries quick then?
It is good that you ask. Clearly you know the story: a brilliant-sounding
optimization that in practice has only a small effect at best...
I'm totally puzzled. It makes absolutely no sense to me...
For my analysis, in addition to creating the index on (type, zipk) that you
suggested, I also added an extra column to T containing a random integer in
the range 0..99, and created an index on this, so that I could produce a
totally "shuffled clustering". I compared the performance in going from a
randomly-clustered table to a (type, zipk)-clustered table, and the output
of EXPLAIN was encouraging, but when I ran the actual queries under EXPLAIN
ANALYZE the difference in execution time was negligible.
Live and learn!
Actually, what's driving me absolutely insane is the documentation for
EXPLAIN and for Pg's query planning in general. I've read the docs (in
particular the chapter on performance), but I still can't make any sense of
EXPLAINs results, so I can't begin to understand why optimizations like the
one you suggested turned out to be ineffective. For example, the first
lines of two recent EXPLAIN ANALYZE outputs are
Nested Loop Left Join (cost=58.00..1154.22 rows=626 width=26) (actual time=
1.462..26.494 rows=2240 loops=1)
Merge Left Join (cost=33970.96..34887.69 rows=58739 width=26) (actual time=
106.961..126.589 rows=7042 loops=1)
Actual runtimes are 27ms and 128ms. The ratio 128/27 is much smaller than
one would expect from the relative costs of the two queries. It looks like
there is no proportionality at all between the estimated costs and actual
running time... (BTW, all these runs of EXPLAIN were done after calls to
VACUUM ANALYZE.) This is one of the many things I don't understand about
What I would like to be able to do is to at least make enough sense of query
plans to determine whether they are reasonable or not. This requires
knowing the algorithms behind each type of query tree node, but I have not
found this info...
On the positive side, in the course of all this analysis I must have done
*something* to improve the performance, because now even the unoptimized
queries are running pretty fast (e.g. queries that used to take about
1.5seconds are now taking 130ms). But unfortunately I don't know what
that I did to bring this speed-up about!
Anyway, be that as it may, thank you very much for your suggestion.
In response to
pgsql-performance by date
|Next:||From: Laurent Raufaste||Date: 2008-02-26 17:12:07|
|Subject: Re: PG planning randomly ?|
|Previous:||From: Tom Lane||Date: 2008-02-26 16:31:42|
|Subject: Re: PG planning randomly ? |