Re: Parallel threads in query

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel threads in query
Date: 2018-11-01 07:19:45
Message-ID: c9575f44-4211-78f8-e561-e1ed1baa724f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31.10.2018 22:07, Darafei "Komяpa" Praliaskouski wrote:
> Hi,
>
> I've tried porting some of PostGIS algorithms to utilize multiple
> cores via OpenMP to return faster.
>
> Question is, what's the best policy to allocate cores so we can play
> nice with rest of postgres?
>
> What I'd like to see is some function that I can call and get a number
> of threads I'm allowed to run, that will also advise rest of postgres
> to not use them, and a function to return the cores back (or do it
> automatically at the end of query). Is there an infrastructure for that?

I do not completely understand which PostGIS algorithms  you are going
to make parallel.
So may be you should first clarify it.
There are three options to perform parallel execution of the single
query in Postgres now:

1. Use existed Postgres parallel capabilities. For example if there is
some expensive function f() which you are going to execute concurrently,
then  you do not need to do anything: parallel seq scan will do it for
you. You can configure arbitrary number of parallel workers and so
control level of concurrency.
The restriction of the current Postgres parallel query processing
implementation is that
- parallel workers are started for each query
- it is necessary to serialize and pass to parallel workers a lot of
things from coordinator
- in case of seqscan, workers will compete for pages to scan, so
effective number of workers should be < 10, while most powerful modern
servers have hundreds of COU cores.

2. Implement you own parallel plan nodes using existed Postgres parallel
infrastructure. Such approach has most chances to be committed in
Postgres core.
But disadvantages are mostly the same as in 1) Exchange of data between
different process is much more complex and expensive than access to
common memory in case of threads. Mostly likely you will have to use
shared message queue and dynamic shared memory, implemented in Postgres
specially for interaction of parallel workers .

3. Use multithreading to provide concurrent execution of your particular
algorithm (s[awn threads within backend). You should be very careful
with this approach, because Postgres code is not thread safe. So you
should not try to execute in thread any subplan or call any postgres
functions (unless you are 100% sure that them are thread safe).
This approach may be easy to implement and provide better performance
than 1). But please notice its limitations. I have used such approach in
my IMCS extension (In-Memory-Columnar-Store).

You can look at pg_strom extension as an example of providing parallel
query execution (in this case using parallel capabilities of video cards).

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2018-11-01 07:56:48 Re: row filtering for logical replication
Previous Message Amit Kapila 2018-11-01 06:43:51 Re: zheap: a new storage format for PostgreSQL