Re: Calculating statistic via function rather than with query is slowing my query

From: Anish Kejariwal <anishkej(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Calculating statistic via function rather than with query is slowing my query
Date: 2011-08-18 18:46:06
Message-ID: CAOpcnr8K0Rw6_=vZh4-cR9=6CXhBQvEiK_8as5NcC8O4WW4fqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Craig,

Fair point. For now, I mean "just fast" - which is 5-15 seconds, but I'd
like to get it down to the 1-2 second range.

From the query I provided, I have approximately 30,000 unique keys (what I
called primary_id) that I'm grouping by, and each key has a series of
numerical values for each of the type_ids. I'm looking at averages, stddev
and other statistics across a few hundred type_ids (where agg.type_id in
....). The part of the query that varies is the user specified type_ids,
which makes it impossible to precalculate my statistics.

I'd like this to eventually scale to a million unique keys, and a thousand
type_ids.

For now Postgres been great for modeling the data, understanding where I hit
performance bottle necks, and providing a fast enough user interface. But,
I'm definitely starting to think about whether I can cache my data (with
millions of keys and thousands of type_ids, the data might be too large),
and whether to look into distributed databases (even thought I can't
precompute the stats, my queries are easily distributable across multiple
processors since each processor could take a batch of keys). I might even
want to consider a column oriented database - since my keys don't change
often, I could potentially add new columns when there are new type_ids.

I've been thinking of looking into memcached or hbase. If you have any
suggestions on which options I should explore, I'd greatly appreciate it.

Sorry, for veering off topic a bit from postgres.

thanks,
Anish

On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:

> On 18/08/2011 9:03 AM, Anish Kejariwal wrote:
>
>> Thanks for the help Pavel and Craig. I really appreciate it. I'm going
>> to try a couple of these different options (write a c function, use a sql
>> function with case statements, and use plperl), so I can see which gives me
>> the realtime performance that I need, and works best for clean code in my
>> particular case.
>>
> Do you really mean "realtime"? Or just "fast"?
>
> If you have strongly bounded latency requirements, any SQL-based,
> disk-based system is probably not for you. Especially not one that relies on
> a statics-based query planner, caching, and periodic checkpoints. I'd be
> looking into in-memory databases designed for realtime environments where
> latency is critical.
>
> Hard realtime: If this system fails to respond within <x> milliseconds, all
> the time, every time, then something will go "smash" or "boom" expensively
> and unrecoverably.
>
> Soft realtime: If this system responds late, the late response is expensive
> or less useful. Frequent late responses are unacceptable but the occasional
> one might be endurable.
>
> Just needs to be fast: If it responds late, the user gets irritated because
> they're sitting and waiting for a response. Regular long stalls are
> unacceptable, but otherwise the user can put up with it. You're more
> concerned with average latency than maximum latency.
>
> --
> Craig Ringer
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Midge Brown 2011-08-18 21:55:50 settings input for upgrade
Previous Message Ogden 2011-08-18 17:31:28 Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++