Re: Query slows after offset of 100K

From: Matthew <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query slows after offset of 100K
Date: 2008-02-15 14:47:06
Message-ID: Pine.LNX.4.64.0802151438540.20402@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 14 Feb 2008, Michael Lorenz wrote:
> When offsetting up to about 90K records, the EXPLAIN ANALYZE is similar to the following:
> Limit (cost=15357.06..15387.77 rows=20 width=35) (actual time=19.235..19.276 rows=20 loops=1)
> -> Index Scan using account_objectname on "object" o (cost=0.00..1151102.10 rows=749559 width=35) (actual time=0.086..14.981 rows=10020 loops=1)
> Index Cond: (accountid = 354)
> Filter: ((NOT deleted) OR (deleted IS NULL))
> Total runtime: 19.315 ms

Since this is scanning through 10,000 random rows in 19 milliseconds, I
say all this data is already in the cache. If it wasn't, you'd be looking
at 10,000 random seeks on disk, at about 7ms each, which is 70 seconds.
Try dropping the OS caches (on Linux echo "1" >/proc/sys/vm/drop_caches)
and see if the performance is worse.

> If I move the offset up to 100K records or higher, I get:
> Limit (cost=145636.26..145636.31 rows=20 width=35) (actual time=13524.327..13524.355 rows=20 loops=1)
> -> Sort (cost=145386.26..147260.16 rows=749559 width=35) (actual time=13409.216..13481.793 rows=100020 loops=1)
> Sort Key: objectname
> -> Seq Scan on "object" o (cost=0.00..16685.49 rows=749559 width=35) (actual time=0.011..1600.683 rows=749549 loops=1)
> Filter: (((NOT deleted) OR (deleted IS NULL)) AND (accountid = 354))
> Total runtime: 14452.374 ms

And here, it only takes 1.5 seconds to fetch the entire table from disc
(or it's already in the cache or something), but 14 seconds to sort the
whole lot in memory.

In any case, Postgres is making a good choice - it's just that you have an
unexpected benefit in the first case that the data is in cache. Setting
the effective cache size correctly will help the planner in this case.
Setting work_mem higher will improve the performance of the sort in the
second case.

Of course, what others have said about trying to avoid large offsets is
good advice. You don't actually need a unique index, but it makes it
simpler if you do.

Matthew

--
The early bird gets the worm. If you want something else for breakfast, get
up later.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2008-02-15 18:06:11 Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Previous Message Greg Smith 2008-02-15 14:29:05 Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine