Re: Abnormal performance difference between Postgres and MySQL

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Farhan Husain <russoue(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Abnormal performance difference between Postgres and MySQL
Date: 2009-02-26 00:07:42
Message-ID: C5CB1D4E.2BF9%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I will second Kevin's suggestion. Unless you think you will have more than a few dozen concurrent queries, start with work_mem around 32MB.
For the query here, a very large work_mem might help it hash join depending on the data... But that's not the real problem here.

The real problem is that it does a huge scan of all of the a1 table, and sorts it. Its pretty clear that this table has incorrect statistics. It thinks that it will get about 1 million rows back in the scan, but it is actually 3 million in the scan.

Crank up the statistics target on that table from the default to at least 100, perhaps even 1000. This is a large table, the default statistics target of 10 is not good for large tables with skewed column data. Those to try increasing the target on are the columns filtered in the explain: graphid, prop, and obj. Then run vacuum analzye on that table (a1). The planner should then have better stats and will likely be able to use a better plan for the join.

The other tables involved in the join also seem to have bad statistics. You might just take the easiest solution and change the global statistics target and vacuum analyze the tables involved:

set default_statistics_target = 50;
vacuum analyze jena_g1t1_stmt ;

(test the query)

Repeat for several values of the default statistics target. You can run "explain" before running the actual query, to see if the plan changed. If it has not, the time will not likely change.
The max value for the statistics target is 1000, which makes analyzing and query planning slower, but more accurate. In most cases, dramatic differences can happen between the default of 10 and values of 25 or 50. Sometimes, you have to go into the hundreds, and it is safer to do this on a per-column basis once you get to larger values.

For larger database, I recommend increasing the default to 20 to 40 and re-analyzing all the tables.

On 2/25/09 3:11 PM, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>>> Farhan Husain <russoue(at)gmail(dot)com> wrote:
> Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov wrote:
>> >>> Farhan Husain <russoue(at)gmail(dot)com> wrote:
>> > The machine postgres is running on has 4 GB of RAM.
>>
>> In addition to the other suggestions, you should be sure that
>> effective_cache_size is set to a reasonable value, which would
>> probably be somewhere in the neighborhood of '3GB'.

> The execution time has not improved. I am going to increase the
> shared_buffers now keeping the work_mem same.

Increasing shared_buffers is good, but it has been mentioned that this
will not change the plan, which currently scans and sorts the whole
table for a1. Nobody will be surprised when you report minimal
change, if any. If you have not changed effective_cache_size (be sure
not to confuse this with any of the other configuration values) it
will think you only have 128MB of cache, which will be off by a factor
of about 24 from reality.

Also, I'm going to respectfully differ with some of the other posts on
the best setting for work_mem. Most benchmarks I've run and can
remember seeing posted found best performance for this at somewhere
between 16MB and 32MB. You do have to be careful if you have a large
number of concurrent queries, however, and keep it lower. In most
such cases, though, you're better off using a connection pool to limit
concurrent queries instead.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-02-26 00:59:55 Re: Abnormal performance difference between Postgres and MySQL
Previous Message Kevin Grittner 2009-02-25 23:11:52 Re: Abnormal performance difference between Postgres and MySQL