Skip site navigation (1) Skip section navigation (2)

Re: optimizing query

From: Chantal Ackermann <chantal(dot)ackermann(at)biomax(dot)de>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing query
Date: 2003-01-23 09:16:01
Message-ID: 3E2FB2D1.9020904@biomax.de (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
hi Stephan,

thank you for your reply.

I ran vacuum analyze before calling explain. As this is a newly built 
database where no rows have been deleted, yet, I thought vacuum full 
would have no effect. In fact, BEFORE running vacuum full, the cost of 
the query is estimates by explain analyze as 33 secs, and AFTER running 
it, the cost is estimate to be 43 secs??? (Hey, I want at least the 10 
secs back ;-) )

I have just installed this database on a "bigger" (see the system info 
further down) machine, and I expected the queries would run _really_ 
fast. especially, as there is a lot more data to be inserted in the 
occurrences tables.

This is the row count of the tables and the output of explain analyze 
before and after running vacuum full (after that, I listed some system 
and postgresql information):

relate=# select count(*) from gene;
  count
--------
  218085
(1 row)

relate=# select count(*) from disease;
  count
--------
  164597
(1 row)

relate=# select count(*) from disease_occurrences_puid;
  count
--------
  471915
(1 row)

relate=# select count(*) from gene_occurrences_puid;
  count
--------
  339347
(1 row)

relate=# explain analyze SELECT DISTINCT gene.gene_name, 
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid, 
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND 
gene.gene_id=gene_occurrences_puid.gene_id;
 
               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=342175.89..352511.77 rows=137812 width=33) (actual 
time=32112.66..33139.23 rows=219435 loops=1)
    ->  Sort  (cost=342175.89..345621.18 rows=1378118 width=33) (actual 
time=32112.65..32616.14 rows=695158 loops=1)
          Sort Key: gene.gene_name, gene_occurrences_puid.puid
          ->  Merge Join  (cost=63671.50..98237.36 rows=1378118 
width=33) (actual time=10061.83..17940.02 rows=695158 loops=1)
                Merge Cond: ("outer".puid = "inner".puid)
                ->  Index Scan using disease_occpd_puid_i on 
disease_occurrences_puid  (cost=0.00..14538.05 rows=471915 width=4) 
(actual time=0.03..3917.99 rows=471915 loops=1)
                ->  Sort  (cost=63671.50..64519.87 rows=339347 width=29) 
(actual time=10061.69..10973.64 rows=815068 loops=1)
                      Sort Key: gene_occurrences_puid.puid
                      ->  Merge Join  (cost=0.00..22828.18 rows=339347 
width=29) (actual time=0.21..3760.59 rows=339347 loops=1)
                            Merge Cond: ("outer".gene_id = "inner".gene_id)
                            ->  Index Scan using gene_pkey on gene 
(cost=0.00..7668.59 rows=218085 width=21) (actual time=0.02..955.19 
rows=218073 loops=1)
                            ->  Index Scan using gene_id_puid_uni on 
gene_occurrences_puid  (cost=0.00..9525.57 rows=339347 width=8) (actual 
time=0.02..1523.81 rows=339347 loops=1)
  Total runtime: 33244.81 msec
(13 rows)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AFTER

relate=# vacuum full verbose analyze;

relate=# explain analyze SELECT DISTINCT gene.gene_name, 
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid, 
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND 
gene.gene_id=gene_occurrences_puid.gene_id;
 
               QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=359069.64..369948.41 rows=145050 width=33) (actual 
time=42195.60..43229.04 rows=219435 loops=1)
    ->  Sort  (cost=359069.64..362695.90 rows=1450503 width=33) (actual 
time=42195.59..42694.70 rows=695158 loops=1)
          Sort Key: gene.gene_name, gene_occurrences_puid.puid
          ->  Merge Join  (cost=63732.51..99264.24 rows=1450503 
width=33) (actual time=13172.40..27973.79 rows=695158 loops=1)
                Merge Cond: ("outer".puid = "inner".puid)
                ->  Index Scan using disease_occpd_puid_i on 
disease_occurrences_puid  (cost=0.00..14543.06 rows=471915 width=4) 
(actual time=36.50..10916.29 rows=471915 loops=1)
                ->  Sort  (cost=63732.51..64580.88 rows=339347 width=29) 
(actual time=13126.56..14048.38 rows=815068 loops=1)
                      Sort Key: gene_occurrences_puid.puid
                      ->  Merge Join  (cost=0.00..22889.19 rows=339347 
width=29) (actual time=58.00..6775.55 rows=339347 loops=1)
                            Merge Cond: ("outer".gene_id = "inner".gene_id)
                            ->  Index Scan using gene_pkey on gene 
(cost=0.00..7739.91 rows=218085 width=21) (actual time=29.00..3416.01 
rows=218073
loops=1)
                            ->  Index Scan using gene_id_puid_uni on 
gene_occurrences_puid  (cost=0.00..9525.57 rows=339347 width=8) (actual 
time=28.69..1936.83 rows=339347 loops=1)
  Total runtime: 43338.94 msec

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Postgres Version: 7.3.1
CPU: 1666.767 MHz
RAM: 2070492 kB
shmmax/shmall: 1048576000

postgresql.conf:
shared_buffers: 121600
max_connections: 64
max_fsm_relations = 200
max_fsm_pages = 40000
effective_cache_size = 8000

********************************************************************

Thank you again for your interest and help!

Chantal


Stephan Szabo wrote:
> (Replying to general and performance in a hope to move this
> to performance after a couple of replies).
> 
> On Wed, 22 Jan 2003, Chantal Ackermann wrote:
> 
> 
>>I am getting the following output from EXPLAIN, concerning a query with
>>joins. The merge uses index scans but takes too long, in my opinion. The
>>query is in fact only a part (subquery) of another one, but it is the
>>bottle neck.
>>
>>As I am quite ignorant in optimizing queries, and I have no idea where
>>to find documentation on the net on how to learn optimizing my queries,
>>I am posting this here in hope someone will give me either tips how to
>>optimize, or where to find some tutorial that could help me get along on
>>my own.
>>
>>dropping the "DISTINCT" has some effect, but I can't really do without.
> 
> 
> The first thing is, have you done ANALYZE recently to make sure that the
> statistics are correct and what does EXPLAIN ANALYZE give you (that will
> run the query and give the estimate and actual).  Also, if you haven't
> vacuumed recently, you may want to vacuum full.
> 
> How many rows are there on gene, disease and both occurrances tables?
> I'd wonder if perhaps using explicit sql join syntax (which postgres uses
> to constrain order) to join disease and disease_occurrences_puid before
> joining it to the other two would be better or worse in practice.
> 
> 


In response to

Responses

pgsql-performance by date

Next:From: Hannu KrosingDate: 2003-01-23 10:11:08
Subject: Re: Terrible performance on wide selects
Previous:From: Tomasz MyrtaDate: 2003-01-23 08:20:09
Subject: Re: Same query, same performance

pgsql-general by date

Next:From: Bjorn MetzdorfDate: 2003-01-23 10:42:18
Subject: Re: tsearch comments
Previous:From: Rick GiggerDate: 2003-01-23 08:57:39
Subject: embedded postgres

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group