Re: Parallel Scaling of a pgplsql problem

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Venki Ramachandran <venki_ramachandran(at)yahoo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Parallel Scaling of a pgplsql problem
Date: 2012-04-25 19:36:25
Message-ID: CAEV0TzAVQXf42n9vXu26JAe0-eQ9AgwwAErYFn0HMm8ScQaodA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Apr 25, 2012 at 11:52 AM, Venki Ramachandran <
venki_ramachandran(at)yahoo(dot)com> wrote:

> Hi all:
> Can someone please guide me as to how to solve this problem? If this is
> the wrong forum, please let me know which one to post this one in. I am new
> to Postgres (about 3 months into it)
>
> I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql
> program that does some computation. It takes in a date range and for one
> pair of personnel (two employees in a company) it calculates some values
> over the time period. It takes about 40ms (milli seconds) to complete and
> give me the answer. All good so far.
>
> Now I have to run the same pgplsql on all possible combinations of
> employees and with 542 employees that is about say 300,000 unique pairs.
>
> So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and
> show it on a screen. No user wants to wait for 3 hours, they can probably
> wait for 10 minutes (even that is too much for a UI application). How do I
> solve this scaling problem? Can I have multiple parellel sessions and each
> session have multiple/processes that do a pair each at 40 ms and then
> collate the results. Does PostGres or pgplsql have any parallel computing
> capability.
>

The question is, how much of that 40ms is spent performing the calculation,
how much is spent querying, and how much is function call overhead, and how
much is round trip between the client and server with the query and
results? Depending upon the breakdown, it is entirely possible that the
actual per-record multiplier can be kept down to a couple of milliseconds
if you restructure things to query data in bulk and only call a single
function to do the work. If you get it down to 4ms, that's a 20 minute
query. Get it down to 1ms and you're looking at only 5 minutes for what
would appear to be a fairly compute-intensive report over a relatively
large dataset.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-04-25 20:04:45 Re: Parallel Scaling of a pgplsql problem
Previous Message Robert Klemme 2012-04-25 18:55:09 Re: Configuration Recommendations