generate_series woes

From: Harald Fuchs <hari(dot)fuchs(at)googlemail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: generate_series woes
Date: 2008-04-14 09:21:58
Message-ID: puhce43i6h.fsf@srv.protecting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I think there's something sub-optimal with generate_series.
In the following, "documents" is a table with more than 120000 rows,
vacuumed and analyzed before the queries.

EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM generate_series (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;

This returns:

Sort (cost=4231.52..4232.02 rows=200 width=8) (actual time=41.886..41.887 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=4219.88..4223.88 rows=200 width=8) (actual time=41.843..41.846 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.00..4214.88 rows=1000 width=8) (actual time=1.274..38.193 rows=5009 loops=1)
-> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4) (actual time=1.209..3.102 rows=5009 loops=1)
-> Index Scan using documents_pkey on documents d (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
Index Cond: (d.id = s.val)
Total runtime: 42.218 ms

Now let's wrap generate_series into an SQL function:

CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
SELECT * FROM generate_series ($1, $2) AS g(x);
$$ LANGUAGE sql;

EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM genser (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;

Not surprisingly, this returns the same plan:

Sort (cost=4479.02..4479.52 rows=200 width=8) (actual time=43.606..43.607 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=4467.38..4471.38 rows=200 width=8) (actual time=43.559..43.561 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.00..4462.38 rows=1000 width=8) (actual time=3.564..39.740 rows=5009 loops=1)
-> Function Scan on genser s (cost=0.00..260.00 rows=1000 width=4) (actual time=3.503..5.435 rows=5009 loops=1)
-> Index Scan using documents_pkey on documents d (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
Index Cond: (d.id = s.val)
Total runtime: 44.047 ms
(9 rows)

But look what happens if we tell PostgreSQL how many rows "genser"
will return:

CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
SELECT * FROM generate_series ($1, $2) AS g(x);
$$ LANGUAGE sql ROWS 5009;

EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM genser (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;

Now we get a better plan:

Sort (cost=15545.54..15546.04 rows=200 width=8) (actual time=27.857..27.859 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=15533.89..15537.89 rows=200 width=8) (actual time=27.817..27.819 rows=2 loops=1)
-> Merge Right Join (cost=1610.15..15508.85 rows=5009 width=8) (actual time=7.714..24.133 rows=5009 loops=1)
Merge Cond: (d.id = s.val)
-> Index Scan using documents_pkey on documents d (cost=0.00..13472.20 rows=125518 width=4) (actual time=0.045..6.112 rows=5010 loops=1)
-> Sort (cost=1610.15..1622.67 rows=5009 width=4) (actual time=7.651..9.501 rows=5009 loops=1)
Sort Key: s.val
Sort Method: quicksort Memory: 427kB
-> Function Scan on genser s (cost=0.00..1302.34 rows=5009 width=4) (actual time=3.559..5.262 rows=5009 loops=1)
Total runtime: 28.445 ms
(12 rows)

Since generate_series is a builtin function, can't it tell how many
rows it will return?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message imageguy 2008-04-14 12:09:52 Re: Number or parameters for functions - limited to 32 ?
Previous Message Richard Huxton 2008-04-14 08:37:28 Re: The default text search configuration will be set to "simple" ?