Re: Advice requested on structuring aggregation queries

From: Joe Conway <mail(at)joeconway(dot)com>
To: Dave Crooke <dcrooke(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Advice requested on structuring aggregation queries
Date: 2010-02-23 05:23:25
Message-ID: 4B83664D.3030401@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/22/2010 07:01 PM, Dave Crooke wrote:
> The original data is keyed by a 3-ary tuple of strings .... to keep the
> row size down, in the new data model I'm actually storing 32-bit int's
> in Postgres. The new schema for each table looks like this:
>
> (a integer,
> b integer,
> c integer,
> ts timestamp without timezone,
> value double precision)
>
> with two indexes: (a, b, ts) and (b, ts)

[...snip...]

> There are about 60 different values of b, and for each such value there
> is a exactly one type of rollup. The old code is doing the rollups in
> Postgres with 60 bulk "insert into .... select" statements, hence the
> need for the second index.

[...snip...]

> For better scalability, I am partitioning these tables by time .... I am
> not using PG's table inheritance and triggers to do the partitioning,
> but instead dynamically generating the SQL and table names in the
> application code (Java). In most cases, the rollups will still happen
> from a single source "data_table" and I plan to continue using the
> existing SQL, but I have a few cases where the source "data_table" rows
> may actually come from two adjacent tables.

Without going through your very long set of questions in detail, it
strikes me that you might be better off if you:

1) use PostgreSQL partitioning (constraint exclusion)
2) partition by ts range
3) consider also including b in your partitioning scheme
4) create one index as (ts, a)
5) use dynamically generated SQL and table names in the application
code to create (conditionally) and load the tables

But of course test both this and your proposed method and compare ;-)

Also you might consider PL/R for some of your analysis (e.g. mode would
be simple, but perhaps not as fast):
http://www.joeconway.com/web/guest/pl/r

HTH,

Joe

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message david 2010-02-23 08:23:25 Re: SSD + RAID
Previous Message Dave Crooke 2010-02-23 03:01:33 Advice requested on structuring aggregation queries