Re: How to estimate the shared memory size required for parallel scan?

From: Masayuki Takahashi <masayuki038(at)gmail(dot)com>
To: thomas(dot)munro(at)enterprisedb(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to estimate the shared memory size required for parallel scan?
Date: 2018-08-19 04:28:34
Message-ID: CA+z6ocQ69eWcVqoib2sDR+A3HFWwqerbBWwUe0sRieoFE+c=FA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Sorry, once I sent to Thomas only. This is re-post.)

Hi Thomas,

Thanks you for excellent explaining about shared memory in parallel
scan and 'foreign path'.
Those are points that I want to know. thanks.

> If you just supply an IsForeignScanParallelSafe function that returns
> true, that would allow your FDW to be used inside parallel workers and
> wouldn't need any extra shared memory, but it wouldn't be a "parallel
> scan". It would just be "parallel safe". Each process that does a
> scan of your FDW would expect a full normal scan (presumably returning
> the same tuples in each process).

I think that parallel scan mechanism uses this each worker's full
normal scan to partitioned records, right?
For example, I turned IsForeignScanParallelSafe to true in cstore_fdw
and compared partitioned/non-partitioned scan.

https://gist.github.com/masayuki038/daa63a21f8c16ffa8138b50db9129ced

This shows that counted by each partition and 'Gather Merge' merge results.
As a result, parallel scan and aggregation shows the correct count.

Then, in the case of cstore_fdw, it may not be necessary to reserve
the shared memory in EstimateDSMForeignScan.

> So I guess this hasn't been done before and would require some more
> research.

I agree. I will try some query patterns.
thanks.
2018年8月18日(土) 23:08 Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>:
>
> On Sun, Aug 19, 2018 at 1:40 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > A true parallel scan of an FDW would be one where each process emits
> > an arbitrary fraction of the tuples, but together they emit all of the
> > tuples. You'd almost certainly need to use some shared memory to
> > coordinate that. To say that you support that, I think your
> > GetForeignPaths() function would need to call add_partial_path(). And
> > unless I'm mistaken, whether or not InitializeDSMForeignScan etc are
> > called might be the only indication you get of whether you need to run
> > in parallel-aware mode. I haven't personally heard of any FDWs that
> > can do this yet, but I just tried hacking file_fdw to register a
> > partial path and it seems to work (though of course the results are
> > duplicated because the emitted tuples are not actually partial).
>
> ... though I just noticed that my quick test used "Single Copy" mode.
> I think I see why: it looks like core's create_foreignscan_path()
> function might need to take num_workers and set parallel_aware if > 0.
> So I guess this hasn't been done before and would require some more
> research.
>
> --
> Thomas Munro
> http://www.enterprisedb.com

--
高橋 真之

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2018-08-19 04:50:50 Re: Allowing printf("%m") only where it actually works
Previous Message Alvaro Herrera 2018-08-19 03:59:19 Re: Fix for REFRESH MATERIALIZED VIEW ownership error message