Skip site navigation (1) Skip section navigation (2)

Re: Parallel Scaling of a pgplsql problem

From: Jan Nielsen <jan(dot)sture(dot)nielsen(at)gmail(dot)com>
To: Venki Ramachandran <venki_ramachandran(at)yahoo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Parallel Scaling of a pgplsql problem
Date: 2012-04-26 03:41:13
Message-ID: CANxH4hFe=VDS4WzFAnOFArhVUUfarG4Vubku2CBxjbgz1TWmCQ@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Wed, Apr 25, 2012 at 12:52 PM, Venki Ramachandran <
venki_ramachandran(at)yahoo(dot)com> wrote:

> Hi all:
> Can someone please guide me as to how to solve this problem? If this is
> the wrong forum, please let me know which one to post this one in. I am new
> to Postgres (about 3 months into it)
>
> I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql
> program that does some computation. It takes in a date range and for one
> pair of personnel (two employees in a company) it calculates some values
> over the time period. It takes about 40ms (milli seconds) to complete and
> give me the answer. All good so far.
>
> Now I have to run the same pgplsql on all possible combinations of
> employees and with 542 employees that is about say 300,000 unique pairs.
>
> So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and
> show it on a screen. No user wants to wait for 3 hours,  they can probably
> wait for 10 minutes (even that is too much for a UI application). How do I
> solve this scaling problem? Can I have multiple parellel sessions and each
> session have multiple/processes that do a pair each at 40 ms and then
> collate the results. Does PostGres or pgplsql have any parallel computing
> capability.
>

Setting aside the database concurrency question, have you considered
application-level solutions?

How often does a user expect their rank to change? If a daily rank change
is fine, trigger the (lengthy) ranking calculation nightly and cache the
results in a materialized view for all users; you could continuously
rebuild the view to improve freshness to within 4 hours. To go faster with
an application-level solution, you will have to reduce your calculation to
*what's* likely to be most important to the individual which, again, you
can cache; or, if you can predict *who's* most likely to request a ranking,
calculate these first; or, both.

These are likely good things to consider regardless of any improvements you
make to the back-end ranking calculation, though at you will hit a point of
diminishing returns if your ranking calculation drops below some
"tolerable" wait. In the web world "tolerable" is about 3 seconds for the
general public and about 30 seconds for a captured audience, e.g.,
employees. YMMV.


Cheers,

Jan

In response to

pgsql-performance by date

Next:From: Yeb HavingaDate: 2012-04-26 06:49:12
Subject: Re: Parallel Scaling of a pgplsql problem
Previous:From: Venki RamachandranDate: 2012-04-26 02:40:18
Subject: Re: Parallel Scaling of a pgplsql problem

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group