using LIMIT only on primary table

From: "Dan Langille" <dan(at)langille(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: using LIMIT only on primary table
Date: 2002-07-27 21:26:02
Message-ID: 3D42D7AA.27447.3EE190A0@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

This is an extension of the problem solved by
http://archives.postgresql.org/pgsql-sql/2002-03/msg00020.php but
with a slightly different complication.

I want to get the last 100 port commits from the database. Commits
are stored in
commit_log and commit_log_ports relates commits to ports. A given
commit may
affect more than one port (i.e. there is a 1-N relationship between
commit_log and
commit_log_ports).

This gives me the last 100 commits:

SELECT commit_log.*
FROM commit_log
ORDER BY commit_date DESC, id
LIMIT 100;

So a starting point for the last 100 port commits is:

explain analyze
SELECT distinct commit_log.*
FROM commit_log_ports, commit_log
WHERE commit_log.id = commit_log_ports.commit_log_id
ORDER BY commit_log.commit_date DESC, commit_log_ports.commit_log_id
LIMIT 100;

But has a very high cost:

Limit (cost=11275.92..11283.42 rows=100 width=55) (actual
time=5769.07..5771.92 rows=100 loops=1)
-> Unique (cost=11275.92..11643.73 rows=4904 width=55) (actual
time=5769.05..5770.93 rows=101 loops=1)
-> Sort (cost=11275.92..11275.92 rows=49042 width=55)
(actual time=5769.04..5769.68 rows=112 loops=1)
-> Hash Join (cost=3478.15..6387.22 rows=49042
width=55) (actual time=1263.69..4319.53 rows=49042 loops=1)
-> Seq Scan on commit_log_ports
(cost=0.00..825.42 rows=49042 width=4) (actual time=0.12..346.32
rows=49042 loops=1)
-> Hash (cost=1749.51..1749.51 rows=42951
width=51) (actual time=1074.15..1074.15 rows=0 loops=1)
-> Seq Scan on commit_log
(cost=0.00..1749.51 rows=42951 width=51) (actual time=0.14..396.99
rows=42953 loops=1)
Total runtime: 6158.10 msec

I'd like to get that time down. I suspect it is high this is because
the entire table
is being joined, then the limit occurs. When I add "commit_log.id =
1" to the WHERE clause,
the query plan changes to:

Limit (cost=30.42..30.43 rows=1 width=55) (actual time=0.96..0.99
rows=1 loops=1)
-> Unique (cost=30.42..30.43 rows=1 width=55) (actual
time=0.95..0.97 rows=1 loops=1)
-> Sort (cost=30.42..30.42 rows=1 width=55) (actual
time=0.94..0.94 rows=1 loops=1)
-> Nested Loop (cost=0.00..30.41 rows=1 width=55)
(actual time=0.61..0.64 rows=1 loops=1)
-> Index Scan using commit_log_pkey on
commit_log (cost=0.00..5.93 rows=1 width=51) (actual time=0.31..0.32
rows=1 loops=1)
-> Index Scan using commit_log_ports_cli on
commit_log_ports (cost=0.00..24.41 rows=6 width=4) (actual
time=0.25..0.27 rows=1 loops=1)

I think that confirms that my indexes and PK/FK are set correctly.

I could do something like:

SELECT distinct commit_log_id
FROM commit_log_ports
ORDER BY commit_log_id desc
LIMIT 100;

But that doesn't take into consider the fact that commits can be
added out of order
and are not necessarily added in commit_date order.

Any clues please?

thanks

--
Dan Langille
And yes, I'm looking for a computer job:
http://www.freebsddiary.org/dan_langille.php

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Markus Bertheau 2002-07-28 15:50:22 performance difference in count(1) vs. count(*)?
Previous Message Thiemo Kellner 2002-07-27 01:46:30 Alter column