Joining 2 tables with 300 million rows

From: Amit V Shah <ashah(at)tagaudit(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Joining 2 tables with 300 million rows
Date: 2005-12-08 16:59:24
Message-ID: 0C072E7CC947D511AC9600A0CC7341200256D143@xeon400.tagaudit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

First of all, please pardon if the question is dumb! Is it even feasible or
normal to do such a thing ! This query is needed by a webpage so needs to be
lightning fast. Anything beyond 2-3 seconds is unacceptable performance.

I have two tables

CREATE TABLE runresult
(
id_runresult int8 NOT NULL,
rundefinition_id_rundefinition int4 NOT NULL,
measure_id_measure int4 NOT NULL,
value float4 NOT NULL,
"sequence" varchar(20) NOT NULL,
CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult),
)

CREATE TABLE runresult_has_catalogtable
(
runresult_id_runresult int8 NOT NULL,
catalogtable_id_catalogtable int4 NOT NULL,
value int4 NOT NULL,
CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY
(runresult_id_runresult, catalogtable_id_catalogtable, value)
CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES
runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT
)

Each table has around 300 million records (will grow to probably billions).
Below is the query and the explain analyze --

explain analyze SELECT measure.description, runresult.value
FROM ((((rundefinition INNER JOIN runresult ON
rundefinition.id_rundefinition = runresult.rundefinition_id_rundefinition)
INNER JOIN runresult_has_catalogtable ON runresult.id_runresult =
runresult_has_catalogtable.runresult_id_runresult)
INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_1 ON
runresult.id_runresult =
runresult_has_catalogtable_1.runresult_id_runresult)
INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_2 ON
runresult.id_runresult =
runresult_has_catalogtable_2.runresult_id_runresult)
INNER JOIN measure ON runresult.measure_id_measure = measure.id_measure
WHERE (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52)
AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54)
AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55)
AND ((runresult_has_catalogtable.value)=15806)
AND ((runresult_has_catalogtable_1.value)=1)
AND ((runresult_has_catalogtable_2.value) In (21,22,23,24))
AND ((rundefinition.id_rundefinition)=10106));

'Nested Loop (cost=0.00..622582.70 rows=1 width=28) (actual
time=25.221..150.563 rows=22 loops=1)'
' -> Nested Loop (cost=0.00..622422.24 rows=2 width=52) (actual
time=25.201..150.177 rows=22 loops=1)'
' -> Nested Loop (cost=0.00..622415.97 rows=2 width=32) (actual
time=25.106..149.768 rows=22 loops=1)'
' -> Nested Loop (cost=0.00..621258.54 rows=15 width=24)
(actual time=24.582..149.061 rows=30 loops=1)'
' -> Index Scan using pk_rundefinition on rundefinition
(cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 rows=1 loops=1)'
' Index Cond: (id_rundefinition = 10106)'
' -> Nested Loop (cost=0.00..621254.54 rows=15
width=28) (actual time=24.443..148.784 rows=30 loops=1)'
' -> Index Scan using
runresult_has_catalogtable_value on runresult_has_catalogtable
(cost=0.00..575069.35 rows=14437 width=8) (actual time=0.791..33.036
rows=10402 loops=1)'
' Index Cond: (value = 15806)'
' Filter: (catalogtable_id_catalogtable =
52)'
' -> Index Scan using pk_runresult_ars on
runresult (cost=0.00..3.19 rows=1 width=20) (actual time=0.007..0.007
rows=0 loops=10402)'
' Index Cond: (runresult.id_runresult =
"outer".runresult_id_runresult)'
' Filter: (10106 =
rundefinition_id_rundefinition)'
' -> Index Scan using runresult_has_catalogtable_id_runresult
on runresult_has_catalogtable runresult_has_catalogtable_1
(cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1
loops=30)'
' Index Cond:
(runresult_has_catalogtable_1.runresult_id_runresult =
"outer".runresult_id_runresult)'
' Filter: ((catalogtable_id_catalogtable = 54) AND (value
= 1))'
' -> Index Scan using pk_measure on measure (cost=0.00..3.12 rows=1
width=28) (actual time=0.008..0.010 rows=1 loops=22)'
' Index Cond: ("outer".measure_id_measure =
measure.id_measure)'
' -> Index Scan using runresult_has_catalogtable_id_runresult on
runresult_has_catalogtable runresult_has_catalogtable_2 (cost=0.00..79.42
rows=65 width=8) (actual time=0.007..0.010 rows=1 loops=22)'
' Index Cond: (runresult_has_catalogtable_2.runresult_id_runresult =
"outer".runresult_id_runresult)'
' Filter: ((catalogtable_id_catalogtable = 55) AND ((value = 21) OR
(value = 22) OR (value = 23) OR (value = 24)))'
'Total runtime: 150.863 ms'

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Page 2005-12-08 17:03:27 Re: opinion on disk speed
Previous Message Vivek Khera 2005-12-08 16:52:17 opinion on disk speed