Re: Parallel Foreign Scans - need advice

From: Korry Douglas <korry(at)me(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Foreign Scans - need advice
Date: 2019-05-16 12:45:00
Message-ID: 598BD6D6-B5CF-450D-A0D1-5886602FF0AA@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> That's only a superficial problem. You don't even know if or when the
> workers that are launched will all finish up running your particular
> node, because (for example) they might be sent to different children
> of a Parallel Append node above you (AFAICS there is no way for a
> participant to indicate "I've finished all the work allocated to me,
> but I happen to know that some other worker #3 is needed here" -- as
> soon as any participant reports that it has executed the plan to
> completion, pa_finished[] will prevent new workers from picking that
> node to execute). Suppose we made a rule that *every* worker must
> visit *every* partial child of a Parallel Append and run it to
> completion (and any similar node in the future must do the same)...
> then I think there is still a higher level design problem: if you do
> allocate work up front rather than on demand, then work could be
> unevenly distributed, and parallel query would be weakened.

What I really need (for the scheme I’m using at the moment) is to know how many workers will be used to execute my particular Plan. I understand that some workers will naturally end up idle while the last (busy) worker finishes up. I’m dividing the workload (the number of row groups to scan) by the number of workers to get an even distribution.

I’m willing to pay that price (at least, I haven’t seen a problem so far… famous last words)

I do plan to switch over to get-next-chunk allocator as you mentioned below, but I’d like to get the minimized-seek mechanism working first.

It sounds like there is no reliable way to get the information that I’m looking for, is that right?

> So I think you ideally need a simple get-next-chunk work allocator
> (like Parallel Seq Scan and like the file_fdw patch I posted[1]), or a
> pass-the-baton work allocator when there is a dependency between
> chunks (like Parallel Index Scan for btrees), or a more complicated
> multi-phase system that counts participants arriving and joining in
> (like Parallel Hash) so that participants can coordinate and wait for
> each other in controlled circumstances.

I haven’t looked at Parallel Hash - will try to understand that next.

> If this compressed data doesn't have natural chunks designed for this
> purpose (like, say, ORC stripes), perhaps you could have a dedicated
> workers streaming data (compressed? decompressed?) into shared memory,
> and parallel query participants coordinating to consume chunks of
> that?

I’ll give that some thought. Thanks for the ideas.

— Korry

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2019-05-16 13:28:03 Re: Multivariate MCV stats can leak data to unprivileged users
Previous Message Thomas Munro 2019-05-16 11:08:26 Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown