Re: Inheritance & Indexes

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Alan Williams <alan_williams(at)affymetrix(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Inheritance & Indexes
Date: 2003-06-24 18:12:50
Message-ID: 20030624110450.O8003-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 24 Jun 2003, Alan Williams wrote:

> hs.exon.2=> \d ga_psr_transcript_1
> Table "public.ga_psr_transcript_1"
> Column | Type | Modifiers
> --------------+------------------------+-----------
> id | integer | not null
> parent | integer |
> seqname | character varying(100) | not null
> source_type | smallint | not null
> feature_type | smallint | not null
> start | integer | not null
> stop | integer | not null
> strand | character(1) | not null
> annot_name | character varying(100) | not null
> depth | integer | not null
> Indexes: ga_psr_transcript_1_pkey primary key btree (id),
> ga_psr_transcript_1_start_stop btree ("start", stop),
> ga_psr_transcript_1_stop btree (stop)
> Check constraints: "aw_psr_transcript_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand = '.'::bpchar))
> Triggers: RI_ConstraintTrigger_1412526244,
> RI_ConstraintTrigger_1412526245
>
> hs.exon.2=> \d ga_psr_exon_1
> Table "public.ga_psr_exon_1"
> Column | Type | Modifiers
> -----------------------+------------------------+-----------
> id | integer | not null
> parent | integer |
> seqname | character varying(100) | not null
> source_type | smallint | not null
> feature_type | smallint | not null
> start | integer | not null
> stop | integer | not null
> strand | character(1) | not null
> annot_name | character varying(100) | not null
> transcript_cluster_id | integer | not null
> depth | integer | not null
> Indexes: ga_psr_exon_1_pkey primary key btree (id),
> ga_psr_exon_1_parent btree (parent),
> ga_psr_exon_1_start_stop btree ("start", stop),
> ga_psr_exon_1_stop btree (stop)
> Check constraints: "aw_psr_exon_1_strand" (((strand = '+'::bpchar) OR (strand = '-'::bpchar)) OR (strand = '.'::bpchar))
> Triggers: RI_ConstraintTrigger_1412526088,
> RI_ConstraintTrigger_1412526089
>
> hs.exon.2=> select count(*) from ga_psr_transcript_1;
> count
> -------
> 43398
> (1 row)
>
> hs.exon.2=> select count(*) from ga_psr_exon_1;
> count
> --------
> 176908
> (1 row)
>
> Now if I do a join on the "leaf" tables everything looks good:
>
> hs.exon.2=> explain select * from ga_psr_transcript_1 t, ga_psr_exon_1e where e.parent = t.id;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=0.00..9087.71 rows=176908 width=98)
> Merge Cond: ("outer".id = "inner".parent)
> -> Index Scan using ga_psr_transcript_1_pkey on ga_psr_transcript_1 t (cost=0.00..1066.17 rows=43398 width=47)
> -> Index Scan using ga_psr_exon_1_parent on ga_psr_exon_1 e (cost=0.00..5259.52 rows=176908 width=51)
> (4 rows)
>
> If I do a join on the parent table, the optimizer refuses to use the
> indicies:
>
> hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e where e.parent = t.id;

In this case, you can't use a single index scan to get the rows in order
so the part that makes the above a nice plan doesn't really apply. If
you're getting all the rows and sorting them, index scans are probably a
waste of time unless you have alot of dead space. If we supported
multi-table indexes, that'd potentially let you get a plan like
the above.

> -----------------------------------------------------------------------------------------------
> Merge Join (cost=1239155.37..70188119.40 rows=5514877218 width=334)
> Merge Cond: ("outer".id = "inner".parent)
> -> Sort (cost=243481.37..244816.14 rows=533908 width=165)
> Sort Key: t.id
> -> Append (cost=0.00..10980.08 rows=533908 width=165)
[lots of seqscans snipped]
> -> Sort (cost=995674.00..1000838.64 rows=2065853 width=169)
> Sort Key: e.parent
> -> Append (cost=0.00..43563.52 rows=2065853 width=169)
[more seqscans snipped]

> Same thing even if I'm querying for a specific tuple:
>
> hs.exon.2=> explain select * from ga_psr_transcript t, ga_psr_exon e
> where e.parent = t.id and t.id = 123;

ISTM it's willing to use an index scan on at least some of t's subtables.
Does explicitly saying e.parent=123 help?

> QUERY PLAN
> ------------------------------------------------------------------------
> ---------------------------------------------------
> Hash Join (cost=99.06..73488.33 rows=320207 width=334)
> Hash Cond: ("outer".parent = "inner".id)
> -> Append (cost=0.00..43563.52 rows=2065853 width=169)
[lots of seqscans snipped again]
> -> Hash (cost=98.98..98.98 rows=31 width=165)
> -> Append (cost=0.00..98.98 rows=31 width=165)
> -> Seq Scan on ga_psr_transcript t (cost=0.00..22.50 rows=5 width=165)
> Filter: (id = 123)
[snipping a bunch of index scans]
> -> Seq Scan on ga_psr_transcript_m t (cost=0.00..1.12 rows=1 width=47)
> Filter: (id = 123)
> (86 rows)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel E. Fisher 2003-06-24 18:24:49 Failure to install 7.3.3
Previous Message Ernest E Vogelsinger 2003-06-24 18:07:37 Re: Database design problem: multilingual strings