Re: Performance on large, append-only tables

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "David Yeu" <david(dot)yeu(at)skype(dot)net>
Cc: "ops(at)groupme(dot)com" <ops(at)groupme(dot)com>
Subject: Re: Performance on large, append-only tables
Date: 2012-02-10 15:44:02
Message-ID: 4F34E6E20200002500045251@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

David Yeu <david(dot)yeu(at)skype(dot)net> wrote:

> We have indices against the primary key and the group_id.
> Our queries essentially fall into the following cases:
>
> * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20;
> * Š WHERE group_id = ? AND id > ? ORDER BY created_at DESC;
> * Š WHERE group_id = ? AND id < ? ORDER BY created_at DESC LIMIT
> 20;
> * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET
> ?;
>
> In human words, we're looking for:
>
> * The most recent (20) rows.
> * The most recent rows after a given `id'.
> * Twenty rows before a given `id'.
> * Pages of twenty rows.

The first thing I would try is building an index (perhaps
CONCURRENTLY to avoid disrupting production) on (group_id,
created_at). It might also be worth creating an index on (group_id,
id, created_at), but that's a less-sure win.

> Originally, this table was part of our primary database, but
> recently we saw queries take upwards of thirty seconds or more to
> complete. Since we're serving web requests, that's basically
> unacceptable, and caused a lot of requests to backup.

With only the indexes you mention, it had to be doing either
complete table scans for each request, or a lot of random access to
rows it didn't need.

> Our interim solution has been to simply carve out a new database
> that hosts only this table, and that has worked to some degree. We
> aren't seeing thirty seconds plus database response times anymore,
> but some queries still take many seconds and the cost of spinning
> up a new master-slave configuration hasn't been cheap.

Well, throwing hardware at something doesn't generally hurt, but
it's not the first solution I would try, especially when the product
you're using has ways to tune performance.

> In the meantime, we're hoping to investigate other ways to
> optimize this table and the queries against it. Heroku's data team
> has suggested balling up these rows into arrays, where a single
> row would represent a group_id, and the data would occupy a single
> column as an array.

Ugh. You're a long way from needing to give up on the relational
model here.

> And finally, we're also trying out alternative stores, since it
> seems like this data and its retrieval could be well suited to
> document-oriented backends. Redis and DynamoDB are currently the
> best contenders.

Your current use of PostgreSQL is more or less equivalent to driving
a car around in first gear. You might consider a tuned PostgreSQL
as another alternative store. :-)

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Yeu 2012-02-10 16:19:57 Re: Performance on large, append-only tables
Previous Message Tom Lane 2012-02-10 15:35:52 Re: Performance on large, append-only tables