Skip site navigation (1) Skip section navigation (2)

Re: optimizing Postgres queries

From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: optimizing Postgres queries
Date: 2004-01-05 11:28:32
Message-ID: 4B0023AC-3F72-11D8-B3EB-000A95A6F0DC@cluster9.com (view raw)
Hi,

we are new to Postgres and we are evaluating Postgres 7.4 on MacOS X as 
an alternative to FrontBase 3.6.27.

 From the available features Postgres is the choice #1.

We have some tests to check the performance and FrontBase is about 10 
times faster than Postgres. We already played around with explain 
analyse select. It seems that for large tables Postgres does not use an 
index. We often see the scan message in the query plan. Were can we 
find more hints about tuning the performance? The database is about 350 
MB large, without BLOB's. We tried to define every important index for 
the selects but it seems that something still goes wrong: FrontBase 
needs about 23 seconds for about 4300 selects and Postgres needs 4 
minutes, 34 seconds.

Any clues?

regards David


From: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 11:35:55
Message-ID: 200401051705.55120.shridhar_daithankar@myrealbox.com (view raw)
On Monday 05 January 2004 16:58, David Teran wrote:
> We have some tests to check the performance and FrontBase is about 10
> times faster than Postgres. We already played around with explain
> analyse select. It seems that for large tables Postgres does not use an
> index. We often see the scan message in the query plan. Were can we
> find more hints about tuning the performance? The database is about 350
> MB large, without BLOB's. We tried to define every important index for
> the selects but it seems that something still goes wrong: FrontBase
> needs about 23 seconds for about 4300 selects and Postgres needs 4
> minutes, 34 seconds.

Check 
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Are you sure you are using correct data types on indexes?

e.g. if field1 is an int2 field, then following query would not use an index.

select * from table where field1=2;

However following will

select * from table where field1=2::int2;

It is called as typecasting and postgresql is rather strict about it when it 
comes to making a decision of index usage.

I am sure above two tips could take care of some of the problems. 

Such kind of query needs more specific information. Can you post explain 
analyze output for queries and database schema.

 HTH

 Shridhar


From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 12:05:03
Message-ID: 64C8DBBA-3F77-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
Hi Shridhar,

> Are you sure you are using correct data types on indexes?
>
Did not know about this...

> e.g. if field1 is an int2 field, then following query would not use an 
> index.
>
our fk have the type bigint, when i try one simple select like this:

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
t0.ID_FOREIGN_TABLE = 21110;

i see that no index is being used whereas when i use

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
t0.ID_FOREIGN_TABLE = 21110::bigint;

an index is used. Very fine, the performance is about 10 to 100 times 
faster for the single select.

I am using WebObjects with JDBC. I will now create a DB with integer 
instead of bigint and see how this performs.

regards David


From: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 12:10:05
Message-ID: 200401051740.05006.shridhar_daithankar@myrealbox.com (view raw)
On Monday 05 January 2004 17:35, David Teran wrote:
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
> t0.ID_FOREIGN_TABLE = 21110;
>
> i see that no index is being used whereas when i use
>
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
> t0.ID_FOREIGN_TABLE = 21110::bigint;
>
> an index is used. Very fine, the performance is about 10 to 100 times
> faster for the single select.
>
> I am using WebObjects with JDBC. I will now create a DB with integer
> instead of bigint and see how this performs.

The performance will likely to be the same. Its just that integer happens to 
be default integer type and hence it does not need an explicit typecast. ( I 
don't remember exactly which integer is default but it is either of int2,int4 
and int8...:-))

The performance diffference is likely due to use of index, which is in turn 
due to typecasting. If you need bigint, you should use them. Just remember to 
typecast whenever required.

 Shridhar


From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 12:18:06
Message-ID: 3796212E-3F79-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
Hi,

> The performance will likely to be the same. Its just that integer 
> happens to
> be default integer type and hence it does not need an explicit 
> typecast. ( I
> don't remember exactly which integer is default but it is either of 
> int2,int4
> and int8...:-))
>
The docs say int4 is much faster than int8, but i will check this.

> The performance diffference is likely due to use of index, which is in 
> turn
> due to typecasting. If you need bigint, you should use them. Just 
> remember to
> typecast whenever required.

This is my bigger problem: i am using EOF (OR mapping tool) which frees 
me more or less form writing a lot of SQL. If i need to typecast to use 
an index then i have to see how to do this with this framework.

Regards David


From: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 12:37:53
Message-ID: 200401051806.27496.shridhar_daithankar@myrealbox.com (view raw)
On Monday 05 January 2004 17:48, David Teran wrote:
> Hi,
>
> > The performance will likely to be the same. Its just that integer
> > happens to
> > be default integer type and hence it does not need an explicit
> > typecast. ( I
> > don't remember exactly which integer is default but it is either of
> > int2,int4
> > and int8...:-))
>
> The docs say int4 is much faster than int8, but i will check this.

Well yes. That is correct as well. 

What I (really) meant to say that an index scan to pick few in4 tuples 
wouldn't be hell much faster than an index scan to pick same number of tuples 
with int8 definition. 

The initial boost you got from converting to index scan, would be probably 
best you can beat out of it..

Of course if you are scanning a few million of them sequentially, then it is 
different story.

> This is my bigger problem: i am using EOF (OR mapping tool) which frees
> me more or less form writing a lot of SQL. If i need to typecast to use
> an index then i have to see how to do this with this framework.

Well, you can direct your queries to a function rather than table, that would 
cast the argument appropriately and select. Postgresql support function 
overloading as well, in case you need different types of arguments with same 
name.

Or you can write an instead rule on server side which will perform casting 
before touching the table.

I am not sure of exact details it would take to make it work, but it should 
work, at least in theory. That way you can preserve the efforts invested in 
the mapping tool. 

Of course, converting everything to integer might be a simpler option after 
all..:-)


 Shridhar


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 15:15:27
Message-ID: 3FF97F8F.6040305@familyhealth.com.au (view raw)
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
> t0.ID_FOREIGN_TABLE = 21110::bigint;
> 
> an index is used. Very fine, the performance is about 10 to 100 times 
> faster for the single select.

An alternative technique is to do this:

... t0.ID_FOREIGN_TABLE = '21110';

Chris

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 15:22:32
Message-ID: 9966.1073316152@sss.pgh.pa.us (view raw)
David Teran <david(dot)teran(at)cluster9(dot)com> writes:
> This is my bigger problem: i am using EOF (OR mapping tool) which frees 
> me more or less form writing a lot of SQL. If i need to typecast to use 
> an index then i have to see how to do this with this framework.

It's worth pointing out that this problem is fixed (at long last) in
CVS tip.  Ypu probably shouldn't expend large amounts of effort on
working around a problem that will go away in 7.5.

If you don't anticipate going to production for six months or so, you
could adopt CVS tip as your development platform, with the expectation
that 7.5 will be released by the time you need a production system.
I wouldn't recommend running CVS tip as a production database but it
should be plenty stable enough for devel purposes.

Another plan would be to use int4 columns for the time being with the
intention of widening them to int8 when you move to 7.5.  This would
depend on how soon you anticipate needing values > 32 bits, of course.

			regards, tom lane

From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 18:47:28
Message-ID: 9CC6A792-3FAF-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
Hi Tom,

> It's worth pointing out that this problem is fixed (at long last) in
> CVS tip.  Ypu probably shouldn't expend large amounts of effort on
> working around a problem that will go away in 7.5.
>
We have now changed the definition to integer, this will work for some 
time. We are currently evaluating and have several production database 
we might switch in some time.

What we found out now is that a query with a single 'where' works fine, 
the query planer uses the index but when we have 'two' where clauses it 
does not use the index anymore:

EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE 
(t0."ID_VALUE" = 14542); performs fine, less than one millisecond.

EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE 
(t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about 
235 milliseconds.

I tried to change the second one to use IN but this did not help at 
all. Am i doing something wrong? I have an index defined like this:

CREATE INDEX key_value_meta_data__id_value__fk_index ON 
"KEY_VALUE_META_DATA" USING btree ("ID_VALUE");

Regards David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 18:52:40
Message-ID: 16230.1073328760@sss.pgh.pa.us (view raw)
David Teran <david(dot)teran(at)cluster9(dot)com> writes:
> What we found out now is that a query with a single 'where' works fine, 
> the query planer uses the index but when we have 'two' where clauses it 
> does not use the index anymore:

> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE 
> (t0."ID_VALUE" = 14542); performs fine, less than one millisecond.

> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE 
> (t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about 
> 235 milliseconds.

Please, when you ask this sort of question, show the EXPLAIN ANALYZE
output.  It is not a virtue to provide minimal information and see if
anyone can guess what's happening.

			regards, tom lane

From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 19:02:01
Message-ID: A5055773-3FB1-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
Hi Tom,


> David Teran <david(dot)teran(at)cluster9(dot)com> writes:
>> What we found out now is that a query with a single 'where' works  
>> fine,
>> the query planer uses the index but when we have 'two' where clauses  
>> it
>> does not use the index anymore:
>
>> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
>> (t0."ID_VALUE" = 14542); performs fine, less than one millisecond.
>
>> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
>> (t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about
>> 235 milliseconds.
>
> Please, when you ask this sort of question, show the EXPLAIN ANALYZE
> output.  It is not a virtue to provide minimal information and see if
> anyone can guess what's happening.
>
Sorry for that, i thought this is such a trivial question that the  
answer is easy.

explain result from first query:

Index Scan using key_value_meta_data__id_value__fk_index on "KEY_VALUE_M 
ETA_DATA" t0  (cost=0.00..1585.52 rows=467 width=1068) (actual time=0.42 
4..0.493 rows=13 loops=1)

  Index Cond: ("ID_VALUE" = 21094)

Total runtime: 0.608 ms



explain result from second query:

Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931 width 
=1068) (actual time=122.669..172.179 rows=25 loops=1)

  Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

Total runtime: 172.354 ms



I found out that its possible to disable seq scans with set  
enable_seqscan to off; then the second query result looks like this:

Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta 
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..3173. 
35 rows=931 width=1068) (actual time=0.116..0.578 rows=25 loops=1)

  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

Total runtime: 0.716 ms


But i read in the docs that its not OK to turn this off by default. I  
really wonder if this is my fault or not, from my point of view this is  
such a simple select that the query plan should not result in a table  
scan.

Regards David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 19:05:48
Message-ID: 20540.1073329548@sss.pgh.pa.us (view raw)
David Teran <david(dot)teran(at)cluster9(dot)com> writes:
> explain result from second query:

> Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931 width 
> =1068) (actual time=122.669..172.179 rows=25 loops=1)
>   Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

The problem is evidently that the row estimate is so far off (931
estimate vs 25 actual).  Have you done ANALYZE or VACUUM ANALYZE
on this table recently?  If you have, I'd be interested to see the
pg_stats row for ID_VALUE.  It might be that you need to increase
the statistics target for this table.

			regards, tom lane

From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 19:20:49
Message-ID: 4530AF9E-3FB4-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
Hi Tom,

first of all thanks for your help! I really appreciate your fast  
response and if you ever have a question about WebObjects, just drop me  
line ;-)

>> Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931  
>> width
>> =1068) (actual time=122.669..172.179 rows=25 loops=1)
>>   Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
>
> The problem is evidently that the row estimate is so far off (931
> estimate vs 25 actual).  Have you done ANALYZE or VACUUM ANALYZE
> on this table recently?  If you have, I'd be interested to see the
> pg_stats row for ID_VALUE.  It might be that you need to increase
> the statistics target for this table.
>
I am absolutely new to PostgreSQL. OK, after VACUUM ANALYZE i get:

Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta 
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..19.94 
 rows=14 width=75) (actual time=0.615..1.017 rows=25 loops=1)
  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
Total runtime: 2.565 ms

and the second time i invoke this i get


Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta 
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..19.94 
 rows=14 width=75) (actual time=0.112..0.296 rows=25 loops=1)
  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
Total runtime: 0.429 ms

Much better. So i think i will first read more about this optimization  
stuff and regular maintenance things. This is something i like very  
much from FrontBase: no need for such things, simply start and run. But  
other things were not so fine ;-).

Is there any hint where to start to understand more about this  
optimization problem?

regards David

		

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 19:23:45
Message-ID: 20759.1073330625@sss.pgh.pa.us (view raw)
David Teran <david(dot)teran(at)cluster9(dot)com> writes:
> Much better. So i think i will first read more about this optimization  
> stuff and regular maintenance things.

See http://www.postgresql.org/docs/7.4/static/maintenance.html

> Is there any hint where to start to understand more about this  
> optimization problem?

http://www.postgresql.org/docs/7.4/static/performance-tips.html

			regards, tom lane

From: David Teran <david(dot)teran(at)cluster9(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-05 19:57:47
Message-ID: 6F2414E8-3FB9-11D8-A528-000A95A6F0DC@cluster9.com (view raw)
... wow:

executing a batch file with about 4250 selects, including lots of joins 
other things PostgreSQL 7.4 is about 2 times faster than FrontBase 
3.6.27. OK, we will start to make larger tests but this is quite 
interesting already: we did not optimize a lot, just invoked VACUUM 
ANALYZE and then the selects ;-)

Thanks to all who answered to this thread.

cheers David


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing Postgres queries
Date: 2004-01-06 05:25:32
Message-ID: 200401060525.i065PW915257@candle.pha.pa.us (view raw)
David Teran wrote:
> Index?Scan?using?key_value_meta_data__id_value__fk_index,?key_value_meta 
> _data__id_value__fk_index?on?"KEY_VALUE_META_DATA"?t0??(cost=0.00..19.94 
> ?rows=14?width=75)?(actual?time=0.112..0.296?rows=25?loops=1)
> ??Index?Cond:?(("ID_VALUE"?=?21094)?OR?("ID_VALUE"?=?21103))
> Total runtime: 0.429 ms
> 
> Much better. So i think i will first read more about this optimization  
> stuff and regular maintenance things. This is something i like very  
> much from FrontBase: no need for such things, simply start and run. But  
> other things were not so fine ;-).
> 
> Is there any hint where to start to understand more about this  
> optimization problem?

Read the FAQ.  There is an item about slow queries and indexes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

From: Rod Taylor <pg(at)rbt(dot)ca>
To: David Teran <david(dot)teran(at)cluster9(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: optimizing Postgres queries
Date: 2004-01-09 03:16:49
Message-ID: 1073618208.322.18.camel@jester (view raw)
On Mon, 2004-01-05 at 14:57, David Teran wrote:
> ... wow:
> 
> executing a batch file with about 4250 selects, including lots of joins 
> other things PostgreSQL 7.4 is about 2 times faster than FrontBase 
> 3.6.27. OK, we will start to make larger tests but this is quite 
> interesting already: we did not optimize a lot, just invoked VACUUM 
> ANALYZE and then the selects ;-)
> 
> Thanks to all who answered to this thread.

I presume that batch file was executed linearly -- no parallelism?
You're actually testing one of PostgreSQL's shortcomings.

PostgreSQL (in my experience) does much better in such comparisons with
a parallel load -- multiple connections executing varied work (short
selects, complex selects, inserts, updates, deletes).

Anyway, just a tip that you will want to test your actual load. If you
do batch work with a single thread, what you have is fine. But if you
have a website with tens or hundreds of simultaneous connections then
your non-parallel testing will not reflect that work load.



Privacy Policy | About PostgreSQL
Copyright © 1996-2013 The PostgreSQL Global Development Group