Quick Links

Re: Performance on large, append-only tables

From:	David Yeu <david(dot)yeu(at)skype(dot)net>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Performance on large, append-only tables
Date:	2012-02-10 16:19:57
Message-ID:	4205A216-5AD0-4F23-826F-542537B27E62@skype.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Yeah, Reply-All...

Begin forwarded message:

> From: David Yeu <david(dot)yeu(at)skype(dot)net>
> Subject: Re: [PERFORM] Performance on large, append-only tables
> Date: February 10, 2012 10:59:04 AM EST
> To: Merlin Moncure <mmoncure(at)gmail(dot)com>
>
> On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote:
>
>> You can probably significantly optimize this. But first, can we see
>> some explain analyze for the affected queries?
>
> Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1)
> -> Sort (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1)
> Sort Key: id
> Sort Method: top-N heapsort Memory: 29kB
> -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9242.73 rows=4643 width=1321) (actual time=6.131..319.835 rows=23038 loops=1)
> Index Cond: (group_id = ?)
> Filter: ((deleted_at IS NULL) AND (id < ?))
> Total runtime: 348.987 ms
>
>
> A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we ended up with:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id < ?) ORDER BY id DESC LIMIT 20 OFFSET 0;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1)
> -> Index Scan Backward using index_lines_on_group_id_and_id on lines (cost=0.00..6416.04 rows=4443 width=1321) (actual time=17.207..109.867 rows=20 loops=1)
> Index Cond: ((group_id = ?) AND (id < ?))
> Filter: (deleted_at IS NULL)
> Total runtime: 110.039 ms
>
>
> The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole bunch of EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead of `group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be preferable nearly always.
>
> And for reference, here's the EXPLAIN for the LIMIT, OFFSET query:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER BY id DESC LIMIT 20 OFFSET 60;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1)
> -> Sort (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1)
> Sort Key: id
> Sort Method: top-N heapsort Memory: 43kB
> -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9240.40 rows=4646 width=1321) (actual time=0.117..98.905 rows=7999 loops=1)
> Index Cond: (group_id = ?)
> Filter: (deleted_at IS NULL)
> Total runtime: 109.753 ms
>
>
> - Dave
>

Responses

Re: Performance on large, append-only tables at 2012-02-10 16:26:54 from Claudio Freire

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2012-02-10 16:26:54	Re: Performance on large, append-only tables
Previous Message	Kevin Grittner	2012-02-10 15:44:02	Re: Performance on large, append-only tables