Re: Abnormal performance difference between Postgres and MySQL

From: Farhan Husain <russoue(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Abnormal performance difference between Postgres and MySQL
Date: 2009-02-26 17:45:26
Message-ID: 3df32b6d0902260945l57f761fdteedb29b2a5f28070@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Feb 25, 2009 at 6:07 PM, Scott Carey <scott(at)richrelevance(dot)com>wrote:

> I will second Kevin’s suggestion. Unless you think you will have more
> than a few dozen concurrent queries, start with work_mem around 32MB.
> For the query here, a very large work_mem might help it hash join depending
> on the data... But that’s not the real problem here.
>
> The real problem is that it does a huge scan of all of the a1 table, and
> sorts it. Its pretty clear that this table has incorrect statistics. It
> thinks that it will get about 1 million rows back in the scan, but it is
> actually 3 million in the scan.
>
> Crank up the statistics target on that table from the default to at least
> 100, perhaps even 1000. This is a large table, the default statistics
> target of 10 is not good for large tables with skewed column data. Those to
> try increasing the target on are the columns filtered in the explain:
> graphid, prop, and obj. Then run vacuum analzye on that table (a1). The
> planner should then have better stats and will likely be able to use a
> better plan for the join.
>
> The other tables involved in the join also seem to have bad statistics.
> You might just take the easiest solution and change the global statistics
> target and vacuum analyze the tables involved:
>
> set default_statistics_target = 50;
> vacuum analyze jena_g1t1_stmt ;
>
> (test the query)
>
> Repeat for several values of the default statistics target. You can run
> “explain” before running the actual query, to see if the plan changed. If
> it has not, the time will not likely change.
> The max value for the statistics target is 1000, which makes analyzing and
> query planning slower, but more accurate. In most cases, dramatic
> differences can happen between the default of 10 and values of 25 or 50.
> Sometimes, you have to go into the hundreds, and it is safer to do this on
> a per-column basis once you get to larger values.
>
> For larger database, I recommend increasing the default to 20 to 40 and
> re-analyzing all the tables.
>
>
>
>
>
>
> On 2/25/09 3:11 PM, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> >>> Farhan Husain <russoue(at)gmail(dot)com> wrote:
> > Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov wrote:
> >> >>> Farhan Husain <russoue(at)gmail(dot)com> wrote:
> >> > The machine postgres is running on has 4 GB of RAM.
> >>
> >> In addition to the other suggestions, you should be sure that
> >> effective_cache_size is set to a reasonable value, which would
> >> probably be somewhere in the neighborhood of '3GB'.
>
> > The execution time has not improved. I am going to increase the
> > shared_buffers now keeping the work_mem same.
>
> Increasing shared_buffers is good, but it has been mentioned that this
> will not change the plan, which currently scans and sorts the whole
> table for a1. Nobody will be surprised when you report minimal
> change, if any. If you have not changed effective_cache_size (be sure
> not to confuse this with any of the other configuration values) it
> will think you only have 128MB of cache, which will be off by a factor
> of about 24 from reality.
>
> Also, I'm going to respectfully differ with some of the other posts on
> the best setting for work_mem. Most benchmarks I've run and can
> remember seeing posted found best performance for this at somewhere
> between 16MB and 32MB. You do have to be careful if you have a large
> number of concurrent queries, however, and keep it lower. In most
> such cases, though, you're better off using a connection pool to limit
> concurrent queries instead.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
> Thanks a lot Scott! I think that was the problem. I just changed the
default statistics target to 50 and ran explain. The plan changed and I ran
explain analyze. Now it takes a fraction of a second!

Thanks to all of you who wanted to help me. I would be happy if someone does
me one last favor. I want to know how these query plans are generated and
how the parameters you suggested to change affects it. If there is any
article, paper or book on it please give me the name or url.

Here is the output of my latest tasks:

ingentadb=# set default_statistics_target=50;
SET
ingentadb=# show default_statistics_target;
default_statistics_target
---------------------------
50
(1 row)

ingentadb=# vacuum analyze jena_g1t1_stmt;
VACUUM
ingentadb=# EXPLAIN select A0.Subj, A2.Obj From jena_g1t1_stmt A0,
jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::
http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::
http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND
A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::
http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND
A0.Subj=A2.Subj AND A2.Prop='Uv::
http://prismstandard.org/namespaces/1.2/basic/startingPage' AND
A2.GraphID=1;

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..37838.46 rows=7568 width=134)
-> Nested Loop (cost=0.00..7485.09 rows=495 width=148)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0
(cost=0.00..1160.62 rows=97 width=74)
Index Cond: ((obj)::text = 'Uv::
http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::
http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid
= 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1
(cost=0.00..65.15 rows=4 width=74)
Index Cond: (((a1.subj)::text = (a0.subj)::text) AND
((a1.prop)::text = 'Uv::
http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::
http://metastore.ingenta.com/ns/structure/Article'::text))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2
(cost=0.00..61.17 rows=12 width=134)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND
((a2.prop)::text = 'Uv::
http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
(11 rows)

ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0,
jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::
http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::
http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND
A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::
http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND
A0.Subj=A2.Subj AND A2.Prop='Uv::
http://prismstandard.org/namespaces/1.2/basic/startingPage' AND
A2.GraphID=1;

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..37838.46 rows=7568 width=134) (actual
time=6.535..126.791 rows=30 loops=1)
-> Nested Loop (cost=0.00..7485.09 rows=495 width=148) (actual
time=4.404..64.078 rows=30 loops=1)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0
(cost=0.00..1160.62 rows=97 width=74) (actual time=2.127..2.270 rows=30
loops=1)
Index Cond: ((obj)::text = 'Uv::
http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::
http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid
= 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1
(cost=0.00..65.15 rows=4 width=74) (actual time=2.054..2.056 rows=1
loops=30)
Index Cond: (((a1.subj)::text = (a0.subj)::text) AND
((a1.prop)::text = 'Uv::
http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::
http://metastore.ingenta.com/ns/structure/Article'::text))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2
(cost=0.00..61.17 rows=12 width=134) (actual time=2.083..2.086 rows=1
loops=30)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND
((a2.prop)::text = 'Uv::
http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
Total runtime: 127.065 ms
(12 rows)

Thanks and regards,

--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2009-02-26 18:09:54 Re: Abnormal performance difference between Postgres and MySQL
Previous Message Farhan Husain 2009-02-26 17:17:43 Re: Abnormal performance difference between Postgres and MySQL