Skip site navigation (1) Skip section navigation (2)

Re: cost and actual time

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chantal Ackermann <chantal(dot)ackermann(at)biomax(dot)de>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: cost and actual time
Date: 2003-02-17 16:21:56
Message-ID: 9093.1045498916@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
Chantal Ackermann <chantal(dot)ackermann(at)biomax(dot)de> writes:
> the gene_id for 'igg' occurres 110637 times in gene_occurrences, it is 
> the most frequent.

I think the problem here is that the planner doesn't know that (and
probably can't without some kind of cross-table statistics apparatus).
It's generating a plan based on the average frequency of gene_ids, which
is a loser for this outlier.

Probably the most convenient way to do better is to structure things so
that the reduction from gene name to gene_id is done before the planner
starts to develop a plan.  Instead of joining to gene, consider this:

create function get_gene_id (text) returns int as -- adjust types as needed
'select gene_id from gene where gene_name = $1' language sql
immutable strict;  -- in 7.2, instead say "with (isCachable, isStrict)"

EXPLAIN ANALYZE
SELECT tmp.disease_name, count(tmp.disease_name) AS cnt
	FROM (SELECT DISTINCT disease.disease_name, 
disease_occurrences.sentence_id FROM disease, gene_occurrences, 
disease_occurrences
		WHERE
		gene_occurrences.sentence_id=disease_occurrences.sentence_id
		AND get_gene_id('igg')=gene_occurrences.gene_id
		AND disease.disease_id=disease_occurrences.disease_id) AS tmp
GROUP BY tmp.disease_name
ORDER BY cnt DESC;

Now get_gene_id() isn't really immutable (unless you never change the
gene table) but you have to lie and pretend that it is, so that the
function call will be constant-folded during planner startup.  The
planner will then see something like gene_occurrences.gene_id = 42
and it will have a much better shot at determining the number of rows
this matches.

			regards, tom lane

In response to

pgsql-performance by date

Next:From: Manfred KoizarDate: 2003-02-17 16:49:16
Subject: Re: cost and actual time
Previous:From: Rafal KedziorskiDate: 2003-02-17 12:43:37
Subject: Re: [PERFORM] Good performance?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group