Re: optimizing query

From: Chantal Ackermann <chantal(dot)ackermann(at)biomax(dot)de>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing query
Date: 2003-01-23 09:16:01
Message-ID: 3E2FB2D1.9020904@biomax.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

hi Stephan,

thank you for your reply.

I ran vacuum analyze before calling explain. As this is a newly built
database where no rows have been deleted, yet, I thought vacuum full
would have no effect. In fact, BEFORE running vacuum full, the cost of
the query is estimates by explain analyze as 33 secs, and AFTER running
it, the cost is estimate to be 43 secs??? (Hey, I want at least the 10
secs back ;-) )

I have just installed this database on a "bigger" (see the system info
further down) machine, and I expected the queries would run _really_
fast. especially, as there is a lot more data to be inserted in the
occurrences tables.

This is the row count of the tables and the output of explain analyze
before and after running vacuum full (after that, I listed some system
and postgresql information):

relate=# select count(*) from gene;
count
--------
218085
(1 row)

relate=# select count(*) from disease;
count
--------
164597
(1 row)

relate=# select count(*) from disease_occurrences_puid;
count
--------
471915
(1 row)

relate=# select count(*) from gene_occurrences_puid;
count
--------
339347
(1 row)

relate=# explain analyze SELECT DISTINCT gene.gene_name,
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid,
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND
gene.gene_id=gene_occurrences_puid.gene_id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=342175.89..352511.77 rows=137812 width=33) (actual
time=32112.66..33139.23 rows=219435 loops=1)
-> Sort (cost=342175.89..345621.18 rows=1378118 width=33) (actual
time=32112.65..32616.14 rows=695158 loops=1)
Sort Key: gene.gene_name, gene_occurrences_puid.puid
-> Merge Join (cost=63671.50..98237.36 rows=1378118
width=33) (actual time=10061.83..17940.02 rows=695158 loops=1)
Merge Cond: ("outer".puid = "inner".puid)
-> Index Scan using disease_occpd_puid_i on
disease_occurrences_puid (cost=0.00..14538.05 rows=471915 width=4)
(actual time=0.03..3917.99 rows=471915 loops=1)
-> Sort (cost=63671.50..64519.87 rows=339347 width=29)
(actual time=10061.69..10973.64 rows=815068 loops=1)
Sort Key: gene_occurrences_puid.puid
-> Merge Join (cost=0.00..22828.18 rows=339347
width=29) (actual time=0.21..3760.59 rows=339347 loops=1)
Merge Cond: ("outer".gene_id = "inner".gene_id)
-> Index Scan using gene_pkey on gene
(cost=0.00..7668.59 rows=218085 width=21) (actual time=0.02..955.19
rows=218073 loops=1)
-> Index Scan using gene_id_puid_uni on
gene_occurrences_puid (cost=0.00..9525.57 rows=339347 width=8) (actual
time=0.02..1523.81 rows=339347 loops=1)
Total runtime: 33244.81 msec
(13 rows)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AFTER

relate=# vacuum full verbose analyze;

relate=# explain analyze SELECT DISTINCT gene.gene_name,
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid,
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND
gene.gene_id=gene_occurrences_puid.gene_id;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=359069.64..369948.41 rows=145050 width=33) (actual
time=42195.60..43229.04 rows=219435 loops=1)
-> Sort (cost=359069.64..362695.90 rows=1450503 width=33) (actual
time=42195.59..42694.70 rows=695158 loops=1)
Sort Key: gene.gene_name, gene_occurrences_puid.puid
-> Merge Join (cost=63732.51..99264.24 rows=1450503
width=33) (actual time=13172.40..27973.79 rows=695158 loops=1)
Merge Cond: ("outer".puid = "inner".puid)
-> Index Scan using disease_occpd_puid_i on
disease_occurrences_puid (cost=0.00..14543.06 rows=471915 width=4)
(actual time=36.50..10916.29 rows=471915 loops=1)
-> Sort (cost=63732.51..64580.88 rows=339347 width=29)
(actual time=13126.56..14048.38 rows=815068 loops=1)
Sort Key: gene_occurrences_puid.puid
-> Merge Join (cost=0.00..22889.19 rows=339347
width=29) (actual time=58.00..6775.55 rows=339347 loops=1)
Merge Cond: ("outer".gene_id = "inner".gene_id)
-> Index Scan using gene_pkey on gene
(cost=0.00..7739.91 rows=218085 width=21) (actual time=29.00..3416.01
rows=218073
loops=1)
-> Index Scan using gene_id_puid_uni on
gene_occurrences_puid (cost=0.00..9525.57 rows=339347 width=8) (actual
time=28.69..1936.83 rows=339347 loops=1)
Total runtime: 43338.94 msec

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Postgres Version: 7.3.1
CPU: 1666.767 MHz
RAM: 2070492 kB
shmmax/shmall: 1048576000

postgresql.conf:
shared_buffers: 121600
max_connections: 64
max_fsm_relations = 200
max_fsm_pages = 40000
effective_cache_size = 8000

********************************************************************

Thank you again for your interest and help!

Chantal

Stephan Szabo wrote:
> (Replying to general and performance in a hope to move this
> to performance after a couple of replies).
>
> On Wed, 22 Jan 2003, Chantal Ackermann wrote:
>
>
>>I am getting the following output from EXPLAIN, concerning a query with
>>joins. The merge uses index scans but takes too long, in my opinion. The
>>query is in fact only a part (subquery) of another one, but it is the
>>bottle neck.
>>
>>As I am quite ignorant in optimizing queries, and I have no idea where
>>to find documentation on the net on how to learn optimizing my queries,
>>I am posting this here in hope someone will give me either tips how to
>>optimize, or where to find some tutorial that could help me get along on
>>my own.
>>
>>dropping the "DISTINCT" has some effect, but I can't really do without.
>
>
> The first thing is, have you done ANALYZE recently to make sure that the
> statistics are correct and what does EXPLAIN ANALYZE give you (that will
> run the query and give the estimate and actual). Also, if you haven't
> vacuumed recently, you may want to vacuum full.
>
> How many rows are there on gene, disease and both occurrances tables?
> I'd wonder if perhaps using explicit sql join syntax (which postgres uses
> to constrain order) to join disease and disease_occurrences_puid before
> joining it to the other two would be better or worse in practice.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bjorn Metzdorf 2003-01-23 10:42:18 Re: tsearch comments
Previous Message Rick Gigger 2003-01-23 08:57:39 embedded postgres

Browse pgsql-performance by date

  From Date Subject
Next Message Hannu Krosing 2003-01-23 10:11:08 Re: Terrible performance on wide selects
Previous Message Tomasz Myrta 2003-01-23 08:20:09 Re: Same query, same performance