Re: Slow query with backwards index scan

From: Tilmann Singer <tils-pgsql(at)tils(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query with backwards index scan
Date: 2007-07-28 20:54:50
Message-ID: 20070728205450.GI19392@tils.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Craig James <craig_james(at)emolecules(dot)com> [20070728 22:00]:
> >>SELECT * FROM (
> >> (SELECT * FROM large_table lt
> >> WHERE lt.user_id = 12345
> >> ORDER BY created_at DESC LIMIT 10) AS q1
> >> UNION
> >> (SELECT * FROM large_table lt
> >> WHERE user_id IN (SELECT contact_id FROM relationships WHERE
> >> user_id=12345)
> >> ORDER BY created_at DESC LIMIT 10) AS q2
> >>ORDER BY created_at DESC LIMIT 10;
> >
> >It's not possible to use ORDER BY or LIMIT within unioned queries.
> >
> >http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION
>
> If I'm reading this documentation correctly, it *is* possible, as long as
> they're inside of a sub-select, as in this case.

I completely overlooked that obvious note in the documentation,
sorry. I tried it only with the aliases which fooled me into thinking
that doesn't work at all:

testdb=# (select 1 limit 1) as q1 union (select 2) as q2;
ERROR: syntax error at or near "as"
LINE 1: (select 1 limit 1) as q1 union (select 2) as q2;
^
but this works:

testdb=# (select 1 limit 1) union (select 2);
?column?
----------
1
2

Great - that works!

What I didn't realize in the original post is that the problem
actually seems to be how to retrieve the rows from large_table for the
correlated relationships in an efficient way - the second of the two
queries that could be UNIONed.

Using a subselect is efficient for the user with few relationships and
matched rows at the end of the sorted large_table:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE
user_id=12345)
ORDER BY created_at DESC LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=94.598..94.629 rows=4 loops=1)
-> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=94.592..94.602 rows=4 loops=1)
Sort Key: lt.created_at
-> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=85.670..94.547 rows=4 loops=1)
-> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=23.549..23.552 rows=1 loops=1)
-> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=23.526..23.530 rows=1 loops=1)
Recheck Cond: (user_id = 12345)
-> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=0.027..0.027 rows=1 loops=1)
Index Cond: (user_id = 12345)
-> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual time=62.108..70.952 rows=4 loops=1)
Index Cond: (lt.user_id = relationships.contact_id)
Total runtime: 94.875 ms

But the subselect is not fast for the user with many relationships and
matched rows at the beginning of the sorted large_table:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE
user_id=55555)
ORDER BY created_at DESC LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=53187.349..53187.424 rows=10 loops=1)
-> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual time=53187.341..53187.360 rows=10 loops=1)
Sort Key: lt.created_at
-> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual time=201.728..52673.800 rows=69018 loops=1)
-> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual time=178.777..178.966 rows=40 loops=1)
-> Bitmap Heap Scan on relationships (cost=4.33..39.49 rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1)
Recheck Cond: (user_id = 55555)
-> Bitmap Index Scan on relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0) (actual time=28.721..28.721 rows=40 loops=1)
Index Cond: (user_id = 55555)
-> Index Scan using large_user_id_started_at_index on large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual time=21.994..1301.375 rows=1725 loops=40)
Index Cond: (lt.user_id = relationships.contact_id)
Total runtime: 53188.584 ms

Using a join now the query for matches for the user with little data is slow:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
JOIN relationships r ON lt.user_id=r.contact_id WHERE
r.user_id=12345
ORDER BY lt.created_at DESC LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..24116.65 rows=10 width=645) (actual time=100348.436..145552.633 rows=4 loops=1)
-> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual time=100348.429..145552.602 rows=4 loops=1)
-> Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..824833.09 rows=4384343 width=621) (actual time=28.961..82448.167 rows=4384064 loops=1)
-> Index Scan using relationships_user_id_contact_id_index on relationships r (cost=0.00..6.36 rows=1 width=24) (actual time=0.009..0.009 rows=0 loops=4384064)
Index Cond: ((r.user_id = 12345) AND (lt.user_id = r.contact_id))
Total runtime: 145552.809 ms

And for the user with much data it's fast:

testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
JOIN relationships r ON lt.user_id=r.contact_id WHERE
r.user_id=55555
ORDER BY lt.created_at DESC LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..24116.65 rows=10 width=645) (actual time=0.068..0.428 rows=10 loops=1)
-> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual time=0.063..0.376 rows=10 loops=1)
-> Index Scan Backward using large_created_at_index on large_table lt (cost=0.00..824833.09 rows=4384343 width=621) (actual time=0.028..0.064 rows=13 loops=1)
-> Index Scan using relationships_user_id_contact_id_index on relationships r (cost=0.00..6.36 rows=1 width=24) (actual time=0.010..0.013 rows=1 loops=13)
Index Cond: ((r.user_id = 55555) AND (lt.user_id = r.contact_id))
Total runtime: 0.609 ms

Any ideas?

tia, Til

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ragnar 2007-07-28 22:36:16 Re: RES: select on 1milion register = 6s
Previous Message andrew 2007-07-28 20:43:58 Re: Slow query with backwards index scan