Skip site navigation (1) Skip section navigation (2)

Re: Trouble with plan statistics for behaviour for query.

From: Craig James <cjames(at)emolecules(dot)com>
To: Trevor Campbell <tcampbell(at)atlassian(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Trouble with plan statistics for behaviour for query.
Date: 2012-05-31 23:08:37
Message-ID: CAFwQ8rf89H+BCvua_EnjtPPK1onrxUA_mMDA8nr9hd5VgQvtcA@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
On Thu, May 31, 2012 at 4:01 PM, Trevor Campbell <tcampbell(at)atlassian(dot)com>wrote:

>  On 01/06/12 08:55, Craig James wrote:
>
>
>
> On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell(at)atlassian(dot)com>wrote:
>
>>  We are having trouble with a particular query being slow in a strange
>> manner.
>>
>> The query is a join over two large tables that are suitably indexed.
>>
>> select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
>> CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
>>    from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
>> CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
>>
>
> This has an unbound variable '?' in it.
>
> These queries are being run from a java application using JDBC and when
> run the variable is bound to an long integer value.  While trying to
> investigate the problem, I have been just hard coding a value in the
> statement.
>

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html

Craig

>
>
>> For some tasks we run this particular query a very large number of times
>> and it has a significant performance impact when it runs slowly.
>>
>> If we run ANALYSE over the CHANGEITEM table then the performance picks
>> up by a factor of 5 or more.  The problem is that a day later the
>> performance will have dropped back to its previously slow state.
>>
>> The reason this is so hard to understand is that the activity on this
>> table is very low, with no updates and only a relatively small number of
>> inserts each day, < 0.1% of the table size.
>>
>> Explain output:
>> Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087
>> rows=14 loops=1)
>>   Sort Key: cg.created, ci.id
>>   Sort Method: quicksort  Memory: 26kB
>>   ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual
>> time=0.022..0.061 rows=14 loops=1)
>>         ->  Index Scan using chggroup_issue on changegroup cg
>> (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
>>               Index Cond: (issueid = 81001::numeric)
>>         ->  Index Scan using chgitem_chggrp on changeitem ci
>> (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
>>               Index Cond: (groupid = cg.id)
>> Total runtime: 0.116 ms
>>
>
> What's the exact SQL you used to get this ... did you use a specific
> CG.ISSUEID to run your test?  If that's the case, this EXPLAIN ANALYZE
> won't be the same as the one generated for your actual application.
>
> Craig
>
>
>
>>
>> The explain output always seems the same even when the performance is
>> poor, but I can't be sure of that.
>>
>> Overall it seems like PostgreSQL just forgets about the statistics it has
>> gathered after a short while.
>>
>> Schema details:
>> CREATE TABLE changegroup
>> (
>>   id numeric(18,0) NOT NULL,
>>   issueid numeric(18,0),
>>   author character varying(255),
>>   created timestamp with time zone,
>>   CONSTRAINT pk_changegroup PRIMARY KEY (id )
>> )
>> WITH (
>>   OIDS=FALSE
>> );
>> CREATE INDEX chggroup_issue
>>   ON changegroup
>>   USING btree
>>   (issueid );
>>
>> CREATE TABLE changeitem
>> (
>>   id numeric(18,0) NOT NULL,
>>   groupid numeric(18,0),
>>   fieldtype character varying(255),
>>   field character varying(255),
>>   oldvalue text,
>>   oldstring text,
>>   newvalue text,
>>   newstring text,
>>   CONSTRAINT pk_changeitem PRIMARY KEY (id )
>> )
>> WITH (
>>   OIDS=FALSE
>> );
>>
>> CREATE INDEX chgitem_chggrp
>>   ON changeitem
>>   USING btree
>>   (groupid );
>>
>> CREATE INDEX chgitem_field
>>   ON changeitem
>>   USING btree
>>   (field COLLATE pg_catalog."default" );
>>
>> Table sizes
>> changegroup  :  2,000,000 rows
>> changeitem    :  2,500,000  rows
>>
>> The changegroup table has on average about 4 rows per issueid value,
>> which is the query parameter.
>>
>> We run autovacuum and autoanalyse, but as the activity in the table is
>> low these are rarely if ever invoked on these tables.
>>
>> Environment.
>> Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this
>> is a problem across a variety of postgres versions.
>>
>>
>

In response to

Responses

pgsql-performance by date

Next:From: Trevor CampbellDate: 2012-05-31 23:34:24
Subject: Re: Trouble with plan statistics for behaviour for query.
Previous:From: Trevor CampbellDate: 2012-05-31 23:01:24
Subject: Re: Trouble with plan statistics for behaviour for query.

pgsql-general by date

Next:From: Lonni J FriedmanDate: 2012-05-31 23:13:28
Subject: autovacuum running for a long time on a new table with 1 row
Previous:From: Trevor CampbellDate: 2012-05-31 23:01:24
Subject: Re: Trouble with plan statistics for behaviour for query.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group