Re: Performance on large, append-only tables

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: David Yeu <david(dot)yeu(at)skype(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "ops(at)groupme(dot)com" <ops(at)groupme(dot)com>
Subject: Re: Performance on large, append-only tables
Date: 2012-02-10 15:34:39
Message-ID: CABRT9RBDXQDwjHSA1GGeAKd97UXRaSTK1B58V6rRmZvit8G2CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Feb 8, 2012 at 20:03, David Yeu <david(dot)yeu(at)skype(dot)net> wrote:
>  * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?;
>  * Pages of twenty rows.

A good improvement for this sort of queries is the "scalable paging"
trick. Instead of increasing the OFFSET argument -- which means that
Postgres has to scan more and more rows -- you should remember an
index key where the last page ended.

In other words, you get the first page using:
WHERE group_id = ? ORDER BY created_at DESC LIMIT 20

Say, this page returns created_at values between 2012-01-01 and
2012-01-10. If the user clicks "next page", you run a query like this
instead:
WHERE group_id = ? AND created_at>'2012-01-10' ORDER BY created_at DESC LIMIT 20

Thus, every "next page" fetch always takes a constant time. Of course
there's a small problem when two rows have equal times. Then, you can
add primary key to the sort key to disambiguate those rows:

WHERE group_id = ? AND (created_at, pkey_col) > ('2012-01-10', 712)
ORDER BY created_at, pkey_col DESC LIMIT 20

Of course an index on (group_id, created_at) or (group_id, created_at,
pkey_col) is necessary for these to work well

Regards,
Marti

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-02-10 15:35:52 Re: Performance on large, append-only tables
Previous Message Claudio Freire 2012-02-10 15:33:33 Re: Performance on large, append-only tables