Re: Slow query: table iteration (8.3)

From: Glenn Maynard <glenn(at)zewt(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query: table iteration (8.3)
Date: 2010-02-05 03:04:41
Message-ID: bd36f99e1002041904w37cac39jeb9d60156a06c151@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Feb 4, 2010 at 4:09 AM, Glenn Maynard <glenn(at)zewt(dot)org> wrote:
> But I'll be dropping this db into 8.4 soon to see if it helps
> anything, and I'll check again (and if it's still slow I'll post more
> details).  It's been a while and I might just have been doing
> something wrong.

Windowing doesn't want to scale for this case. I'll simplify to give
an actual test case.

create table test_users (id serial primary key);
insert into test_users (id) select generate_series(1, 1000);
create table test (id serial primary key, score integer, user_id integer);
insert into test (user_id, score) select s.id, random() * 1000000 from
(select generate_series(1, 1000) as id) as s, generate_series(1,
1000);
create index test_1 on test (score);
create index test_2 on test (user_id, score desc);
analyze;

This generates a thousand users, with a thousand scores each. This
finds the top score for each user (ignoring the detail of duplicate
scores; easy to deal with):

SELECT sub.id FROM (
SELECT t.id, rank() OVER (PARTITION BY t.user_id ORDER BY score
DESC) AS rank
FROM test t
) AS sub WHERE rank <= 1;

This does use the test_2 index (as intended), but it's still very
slow: 11 seconds on my system.

It seems like it's doing a *complete* scan of the index, generating
ranks for every row, and then filters out all but the first of each
rank. That means it scales linearly with the total number of rows.
All it really needs to do is jump to each user in the index and pull
out the first entry (according to the "score desc" part of the test_2
index), which would make it scale linearly with the number of users.

The function version:

CREATE FUNCTION high_score_for_user(user_id int) RETURNS SETOF INTEGER
LANGUAGE SQL AS $$
SELECT t.id FROM test t
WHERE t.user_id = $1
ORDER BY t.score DESC LIMIT 1
$$;
SELECT high_score_for_user(u.id) FROM test_users u;

runs in 100ms.

I think I'm stuck with either creating temporary functions with the
constants already replaced, or creating an SQL function that evaluates
a brand new query as a string as Yeb suggested.

--
Glenn Maynard

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz 2010-02-05 09:00:09 Re: foreign key constraint lock behavour in postgresql
Previous Message Robert Haas 2010-02-05 02:11:17 Re: foreign key constraint lock behavour in postgresql