Re: TimeOf(Subselects|Joins)FromLargeTables?

From: Duane Lee - EGOVX <DLee(at)mail(dot)maricopa(dot)gov>
To: "'Dann Corbit'" <DCorbit(at)connx(dot)com>, "Hegedus, Tamas (dot)" <Hegedus(dot)Tamas(at)mayo(dot)edu>, pgsql-general(at)postgresql(dot)org
Subject: Re: TimeOf(Subselects|Joins)FromLargeTables?
Date: 2004-06-04 16:13:51
Message-ID: 64EDC403A1417B4299488BAE87CA7CBF01CD0E57@maricopa_xcng0
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Or this correlated subquery:

SELECT a.name, a.seq FROM prots as a
WHERE EXISTS (
SELECT '1'FROM kwx as b
WHERE b.kw_acc=812
and b.fid = a.fid
);

-----Original Message-----
From: Dann Corbit [mailto:DCorbit(at)connx(dot)com]
Sent: Thursday, June 03, 2004 6:59 PM
To: Hegedus, Tamas .; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?

How does this query perform:

SELECT p.name, p.seq
FROM prots p, kwx k
WHERE
p.fid=k.fid
AND
k.kw_acc=812
;

> -----Original Message-----
> From: Hegedus, Tamas . [mailto:Hegedus(dot)Tamas(at)mayo(dot)edu]
> Sent: Thursday, June 03, 2004 6:48 PM
> To: 'pgsql-general(at)postgresql(dot)org'
> Subject: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?
>
>
> Dear All,
>
> I am a biologist and I do not know what to expect from an RDB
> (PgSQL). I have large tables: 1215607 rows in prots, 2184596
> rows in kwx (see table details below). I would like to do
> something like that:
>
> SELECT name, seq FROM prots WHERE fid in (SELECT fid FROM kwx
> WHERE kw_acc=812);
>
> After executing this (either as a subquery or joins) the
> best/fastest result what I had (SET enable_seqscan=off):
> 83643.482 ms (see EXPLAIN ANALYZE below).
>
> The two (similar) parts of this query are executed much
> faster: SELECT fid FROM kwx WHERE kw_acc=812 -- takes 302ms,
> n(rows)=78050 SELECT name, seq FROM prots WHERE fid < 80000
> -- takes 1969.231 ms
>
> Is this realistic? OK?
> If not: how can I increase the speed by fine tuning of the
> RDB (indexes, run-time parameters) or my SQL query? (It came
> now into my mind: if I decrease the number of columns in the
> prots table (to have only 3 fields (fid, name, seq) instead
> of 20 columns), than the prots table will have smaller file
> size on disk, than this table may need less disk page
> fetches, queries may be faster. Is this true?)
>
> Thanks for your help!
> Tamas
>
> ===============================================
> Table "public.prots"
> Column | Type | Modifiers
> -----------+----------------------+----------
> fid | integer | not null
> name | character varying(10) | not null
> [...other 17 columns...]
> seq | text |
> Indexes:
> "prots_pkey" primary key, btree (fid)
> "ix_prots_acc" unique, btree (acc)
> "ix_prots_name" unique, btree (name)
> "ix_prots_class" btree ("class")
> ===============================================
> Table "public.kwx"
> Column | Type | Modifiers
> --------+--------+----------
> fid | integer |
> kw_acc | integer |
> Indexes:
> "ix_kwx_acc" btree (kw_acc)
> "ix_kwx_fid" btree (fid)
> Foreign-key constraints:
> "fk_kws_acc" FOREIGN KEY (kw_acc) REFERENCES kw_ref(kw_acc)
> "fk_kws_fid" FOREIGN KEY (fid) REFERENCES prots(fid)
> ===============================================
>
> EXPLAIN ANALYZE SELECT name, seq from prots inner join kwx on
> (prots.fid=kwx.fid) where kwx.kw_acc = 812;
>
> QUERY PLAN
>
> --------------------------------------------------------------
> --------------------------------------------------------------
> ------------------
> Merge Join (cost=0.00..160429.66 rows=84473 width=349)
> (actual time=29.039..83505.629 rows=78050 loops=1)
> Merge Cond: ("outer".fid = "inner".fid)
> -> Index Scan using ix_kwx_fid on kwx
> (cost=0.00..44987.55 rows=84473 width=4) (actual
> time=18.893..5730.468 rows=78050 loops=1)
> Filter: (kw_acc = 812)
> -> Index Scan using prots_pkey on prots
> (cost=0.00..112005.24 rows=981127 width=353) (actual
> time=0.083..76059.235 rows=1210377 loops=1) Total runtime:
> 83643.482 ms (6 rows)
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo(at)postgresql(dot)org
>

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2004-06-04 16:15:09 Re: PostgreSQL certifications?
Previous Message terry 2004-06-04 15:56:13 Re: Disappointing news