Re: is parallel union all possible over dblink?

From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Marinos Yannikos <mjy(at)geizhals(dot)at>
Cc: pgsql-performance(at)postgresql(dot)org, Svetlin Manavski <svetlin(dot)manavski(at)gmail(dot)com>
Subject: Re: is parallel union all possible over dblink?
Date: 2011-06-29 19:14:32
Message-ID: BANLkTin_CFVj24_XX=+o_gZKtTCuqgENQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jun 29, 2011 at 12:37 PM, Marinos Yannikos <mjy(at)geizhals(dot)at> wrote:

> On Wed, 29 Jun 2011 13:55:58 +0200, Svetlin Manavski <
> svetlin(dot)manavski(at)gmail(dot)com> wrote:
>
> Question: Is there a way to get the same result from within a PL/pgSQL
>> function but running all the sub-queries in parallel? In case it is not
>> directly available, which one would be the simplest way to implement it in
>> my application? (I am very keen to avoid the obvious solution of an
>> additional multi-threaded layer which would do it out of the RDBMS)
>>
>
> Have you tried dblink_send_query() + dblink_get_results() yet?
>
> http://www.postgresql.org/**docs/current/static/contrib-**
> dblink-send-query.html<http://www.postgresql.org/docs/current/static/contrib-dblink-send-query.html>
>
> You'd have to do something like this to your queries [untested]:
>
> select dblink_send_query('remote1','**select * from
>
> appqosfe.F_total_utilization(**1306918800000000000::INT8, NULL,
> 60000000000::INT8, NULL)');
>
> (select * from appqosfe.F_total_utilization(**1306918800000000000::INT8,
> NULL,
> 60000000000::INT8, NULL))
> UNION ALL
> (SELECT * from dblink_get_result('remote1') as T1(detectroid numeric,
> timegroup numeric,
> numbytes numeric, numpackets numeric))
> order by timegroup asc;
>
> i.e. start your remote query/-ies asynchronously, then collect the results
> in the UNION query. At least in theory it should work...
>
>
This does work however you'll need to add a little more to it to ensure your
UNION succeeds. In pseudo...

connection #1:
CREATE TABLE target_1 ...
BEGIN;
LOCK TABLE target_1 IN ACCESS EXCLUSIVE MODE;
INSERT INTO target_1 SELECT ...
COMMIT;

connection #2:
CREATE TABLE target_2 ...
BEGIN;
LOCK TABLE target_2 IN ACCESS EXCLUSIVE MODE;
INSERT INTO target_2 SELECT ...
COMMIT;

connection #3:
SELECT * FROM target_1 UNION SELECT * FROM target_2;

Connections 1 and 2 can be done in simultaneously and after both have
reached the LOCK statement then the SELECT on connection 3 can be executed.
Same fundamentals if all three connections are to different databases and
connection 3 uses dblink to pull the data.

Another alternative is to use GridSQL. I haven't used it myself but seen it
in action on a large install with 4 backend databases. Pretty slick.

Greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2011-06-29 19:38:24 Re: is parallel union all possible over dblink?
Previous Message Marinos Yannikos 2011-06-29 18:37:18 Re: is parallel union all possible over dblink?