Re: Any better plan for this query?..

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-11 19:46:15
Message-ID: 20090511194615.GT3305@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Dimitri <dimitrik(dot)fr(at)gmail(dot)com> [090511 11:18]:
> Folks, it's completely crazy, but here is what I found:
>
> - if HISTORY table is analyzed with target 1000 my SELECT response
> time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
> (it's what happenned to 8.3.7)
>
> -if HISTORY table is analyzed with target 5 - my SELECT response time
> is decreasing to 1.2ms (!) and then my max TPS level is ~12.000 !
> and CPU is used up to 95% even by 8.3.7 :-) and 8.4 performed better
> just because I left its analyze target to default 100 value.
>
> Anyone may explain me why analyze target may have so huge negative
> secondary effect?..

It's actually pretty straight forward.

The PostgreSQL query planner is a "smart planner". It takes into
consideration all the statistics available on the columns/tables,
expected outputs based on inputs, etc, to choose what it thinks will be
the best plan. The more data you have in statistics (the larger
statistics target you have), the more CPU time and longer it's going to
take to "plan" your queries. The tradeoff is hopefully better plans.

But, in your scenario, where you are hitting the database with the
absolute worst possible way to use PostgreSQL, with small, repeated,
simple queries, you're not getting the advantage of "better" plans. In
your case, you're throwing absolutely simple queries at PG as fast as
you can, and for each query, PostgreSQL has to:

1) Parse the given "query string"
2) Given the statistics available, plan the query and pick the best one
3) Actually run the query.

Part 2 is going to dominate the CPU time in your tests, more so the more
statistics it has to evaluate, and unless the data has to come from the
disks (i.e. not in shared buffers or cache) is thus going to dominate the
time before you get your results. More statistics means more time
needed to do the planning/picking of the query.

If you were to use prepared statements, the cost of #1 and #2 is done
once, and then every time you throw a new execution of the query to
PostgreSQL, you get to just do #3, the easy quick part, especially for
small simple queries where all the data is in shared buffers or the cache.

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Cory Coager 2009-05-11 21:03:15 Query planner making bad decisions
Previous Message Simon Riggs 2009-05-11 19:03:28 Re: Any better plan for this query?..