Skip site navigation (1) Skip section navigation (2)

Re: Trouble with plan statistics for behaviour for query.

From: Trevor Campbell <tcampbell(at)atlassian(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Trouble with plan statistics for behaviour for query.
Date: 2012-05-31 23:01:24
Message-ID: 4FC7F844.8070205@atlassian.com (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
On 01/06/12 08:55, Craig James wrote:
>
>
> On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell(at)atlassian(dot)com <mailto:tcampbell(at)atlassian(dot)com>> wrote:
>
>     We are having trouble with a particular query being slow in a strange manner.
>
>     The query is a join over two large tables that are suitably indexed.
>
>     select CG.ID <http://CG.ID>, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID <http://CI.ID>, CI.FIELDTYPE, CI.FIELD,
>     CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
>        from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID <http://CG.ID> = CI.GROUPID where
>     CG.ISSUEID=? order by CG.CREATED asc, CI.ID <http://CI.ID> asc
>
>
> This has an unbound variable '?' in it.
These queries are being run from a java application using JDBC and when run the variable is bound to an long integer 
value.  While trying to investigate the problem, I have been just hard coding a value in the statement.
>
>
>     For some tasks we run this particular query a very large number of times and it has a significant performance
>     impact when it runs slowly.
>
>     If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more.  The problem
>     is that a day later the performance will have dropped back to its previously slow state.
>
>     The reason this is so hard to understand is that the activity on this table is very low, with no updates and only
>     a relatively small number of inserts each day, < 0.1% of the table size.
>
>     Explain output:
>     Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
>       Sort Key: cg.created, ci.id <http://ci.id>
>       Sort Method: quicksort  Memory: 26kB
>       ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
>             ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..17.91 rows=8 width=33) (actual
>     time=0.012..0.015 rows=7 loops=1)
>                   Index Cond: (issueid = 81001::numeric)
>             ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=2 width=91) (actual
>     time=0.005..0.005 rows=2 loops=7)
>                   Index Cond: (groupid = cg.id <http://cg.id>)
>     Total runtime: 0.116 ms
>
>
> What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test?  If that's the case, 
> this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.
>
> Craig
>
>
>     The explain output always seems the same even when the performance is poor, but I can't be sure of that.
>
>     Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
>
>     Schema details:
>     CREATE TABLE changegroup
>     (
>       id numeric(18,0) NOT NULL,
>       issueid numeric(18,0),
>       author character varying(255),
>       created timestamp with time zone,
>       CONSTRAINT pk_changegroup PRIMARY KEY (id )
>     )
>     WITH (
>       OIDS=FALSE
>     );
>     CREATE INDEX chggroup_issue
>       ON changegroup
>       USING btree
>       (issueid );
>
>     CREATE TABLE changeitem
>     (
>       id numeric(18,0) NOT NULL,
>       groupid numeric(18,0),
>       fieldtype character varying(255),
>       field character varying(255),
>       oldvalue text,
>       oldstring text,
>       newvalue text,
>       newstring text,
>       CONSTRAINT pk_changeitem PRIMARY KEY (id )
>     )
>     WITH (
>       OIDS=FALSE
>     );
>
>     CREATE INDEX chgitem_chggrp
>       ON changeitem
>       USING btree
>       (groupid );
>
>     CREATE INDEX chgitem_field
>       ON changeitem
>       USING btree
>       (field COLLATE pg_catalog."default" );
>
>     Table sizes
>     changegroup  : 2,000,000 rows
>     changeitem    :  2,500,000  rows
>
>     The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
>
>     We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on
>     these tables.
>
>     Environment.
>     Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this is a problem across a variety of
>     postgres versions.
>
>

In response to

Responses

pgsql-performance by date

Next:From: Craig JamesDate: 2012-05-31 23:08:37
Subject: Re: Trouble with plan statistics for behaviour for query.
Previous:From: Mark KirkwoodDate: 2012-05-31 22:59:17
Subject: Re: does the query planner consider work_mem?

pgsql-general by date

Next:From: Craig JamesDate: 2012-05-31 23:08:37
Subject: Re: Trouble with plan statistics for behaviour for query.
Previous:From: Craig JamesDate: 2012-05-31 22:55:21
Subject: Re: Trouble with plan statistics for behaviour for query.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group