Re: Parallel queries in single transaction

From: Paul Muntyanu <pmuntyanu(at)gmail(dot)com>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel queries in single transaction
Date: 2018-07-16 10:03:08
Message-ID: CACnYr+h7yzXtScWPcwHCCkXF0zLmtbqQjcOb_kyLP9JJ5awe+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas, thanks for looking into. I am more talking about queries which
can not be optimized, e.g.
* fullscan of the table and heavy calculations for another one.
* query through FDW for both queries(e.g. one query fetches data from Kafka
and another one is fetching from remote Postgres. There are no bounds for
both queries for anything except local CPU, network and remote machine)

IO bound is not a problem in case if you have multiple tablesapces. And CPU
bound can be not the case when you have 32 cores and 6 max workers per
query. Then, during nigtly ETL, I do not have anything except single query
running) == 6 cores are occupied. If I can run queries in parallel, I would
occupy two IO stacks(two tablespaces) + 12 cores instead of sequentially 6
and then again 6.

Hope that makes sense

-P

On 16 Jul 2018, at 11:44, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

Hi,

On 07/16/2018 09:45 AM, Paul Muntyanu wrote:

Hello,
I am working with data warehouse based on postgresql and would like to
propose a feature. The idea is to give control and ability for developer to
execute queries in parallel within single transaction. Usual flow is next:
START_TRANSACTION -> QUERY1 -> QUERY2 -> QUERY3 -> END_TRANSACTION. However
sometimes QUERY1 and QUERY2 are independent and can be executed in parallel
mode. E.g.: START_TRANSACTION -> DEFINE_QUERY1(no execution) ->
DEFINE_QUERY2(no_execution) -> EXECUTE_QUERY1_AND_QUERY2(in parallel) ->
QUERY3 -> END
Of course QUERY1 and QUERY2 can be dependent and then this would not work,
but sometimes it is useful, especially when you have bound to e.g. CPU and
query stuck.

I'm struggling to understand what would be the possible benefits. Either
the queries are CPU-bound or stuck (e.g. waiting for I/O), they can't be
both at the same time. If a query is stuck, running it concurrently is
pointless. If they are CPU-bound, we can run them in parallel (which should
produce the results faster).

I'd even dare to say that running the queries concurrently can easily
hinder performance, because the queries will compete for parallel workers,
preventing some of them from running in parallel mode.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-16 11:09:19 Re: Fix some error handling for read() and errno
Previous Message Tomas Vondra 2018-07-16 09:44:50 Re: Parallel queries in single transaction