Quick Links

Seq Scan vs. Index Scan

From:	Nassib Nassar <nassar(at)renci(dot)org>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Seq Scan vs. Index Scan
Date:	2011-08-04 13:40:08
Message-ID:	BD2648C0-836B-4F4C-834A-4B26CF092B20@renci.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

In this example it looks to me like the planner is choosing a Seq Scan resulting in 18x running time compared to running it with enable_seqscan = 'off'. Adding more indexes to public.gene (please see below) seemed to make things worse. I definitely have run VACUUM ANALYZE on everything, manually. What am I missing? Thank you for any feedback.

Query:
SELECT *
FROM gene_af_polyphen
WHERE dataset_id = '001-1' AND
(vartype = 'snp' OR
vartype = 'ins' OR
vartype = 'del' OR
vartype = 'sub');

Query plan:
http://explain.depesz.com/s/qnZ

Query plan after SET enable_seqscan TO 'off':
http://explain.depesz.com/s/N5q

Hardware:
24GB memory / 8 core running Linux 2.6.32 x86_64

Table row counts:
public.gene ~1 billion (may have lots of NULLs in several columns)
public.af 38878319
public.polyphen 25821

Database:

Table "public.polyphen"
Column | Type | Modifiers
-------------------------+------------------------+-----------
mrnaacc | character varying(255) | not null
proteinpos | character varying(255) | not null
annotationrefsequence | character varying(255) | not null
samplesequence | character varying(255) | not null
prediction | character varying(255) |
probability_deleterious | numeric |
Indexes:
"polyphen_annotationrefsequence_idx1" btree (annotationrefsequence), tablespace "indexspace"
"polyphen_mrnaacc_idx1" btree (mrnaacc), tablespace "indexspace"
"polyphen_proteinpos_idx1" btree (proteinpos), tablespace "indexspace"
"polyphen_samplesequence_idx1" btree (samplesequence), tablespace "indexspace"

CREATE VIEW gene_af_polyphen AS
SELECT gene.dataset_id dataset_id,
gene.referencename referencename,
gene.index "index",
gene.locus locus,
gene.haplotype haplotype,
gene.chromosome chromosome,
gene.begincoord begincoord,
gene.endcoord endcoord,
gene.vartype vartype,
gene.reference reference,
gene.call call,
gene.xref xref,
gene.geneid geneid,
gene.mrnaacc mrnaacc,
gene.proteinacc proteinacc,
gene.symbol symbol,
gene.orientation orientation,
gene.exoncategory exoncategory,
gene.exon exon,
gene.codingregionknown codingregionknown,
gene.aacategory aacategory,
gene.nucleotidepos nucleotidepos,
gene.proteinpos proteinpos,
gene.aaannot aaannot,
gene.aacall aacall,
gene.aaref aaref,
gene.allele allele,
gene.component component,
gene.componentindex componentindex,
gene.impact impact,
gene.annotationrefsequence annotationrefsequence,
gene.samplesequence samplesequence,
gene.genomerefsequence genomerefsequence,
gene.pfam pfam,
gene.unknown1 unknown1,
af.rs_id rs_id,
af.allele_freq allele_freq,
polyphen.prediction prediction,
polyphen.probability_deleterious probability_deleterious
FROM gene
LEFT JOIN af
ON gene.chromosome = af.chromosome AND
gene.endcoord = af.endcoord AND
gene.reference = af.reference AND
gene.call = af.call
LEFT JOIN polyphen
ON gene.mrnaacc = polyphen.mrnaacc AND
gene.proteinpos = polyphen.proteinpos AND
gene.annotationrefsequence = polyphen.annotationrefsequence AND
gene.samplesequence = polyphen.samplesequence;

Responses

Re: Seq Scan vs. Index Scan at 2011-08-04 15:07:42 from Kevin Grittner

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig Ringer	2011-08-04 13:49:54	Re: Need to tune for Heavy Write
Previous Message	Nicholson, Brad (Toronto, ON, CA)	2011-08-04 12:41:47	Re: Need to tune for Heavy Write