Re: Parallel Sort

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Sort
Date: 2013-05-14 23:12:34
Message-ID: CAB7nPqQMEOSXkVK75C=Z-kWbrWbtamA-BSQ7c=9cSV4AgTU7Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 14, 2013 at 11:59 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote:
> > On Mon, May 13, 2013 at 11:28 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >
> > > * Identifying Parallel-Compatible Functions
> > >
> > > Not all functions can reasonably run on a worker backend. We should
> not
> > > presume that a VOLATILE function can tolerate the unstable execution
> order
> > > imposed by parallelism, though a function like clock_timestamp() is
> > > perfectly
> > > reasonable to run that way. STABLE does not have that problem, but
> neither
> > > does it constitute a promise that the function implementation is
> compatible
> > > with parallel execution. Consider xid_age(), which would need code
> > > changes to
> > > operate correctly in parallel. IMMUTABLE almost guarantees enough;
> there
> > > may
> > > come a day when all IMMUTABLE functions can be presumed parallel-safe.
> For
> > > now, an IMMUTABLE function could cause trouble by starting a
> (read-only)
> > > subtransaction. The bottom line is that parallel-compatibility needs
> to be
> > > separate from volatility classes for the time being.
> > >
> > I am not sure that this problem is only limited to functions, but to all
> > the expressions
> > and clauses of queries that could be shipped and evaluated on the worker
> > backends when
> > fetching tuples that could be used to accelerate a parallel sort. Let's
> > imagine for example
> > the case of a LIMIT clause that can be used by worker backends to limit
> the
> > number of tuples
> > to sort as final result.
>
> It's true that the same considerations apply to other plan tree constructs;
> however, every such construct is known at build time, so we can study each
> one
> and decide how it fits with parallelism.
>
The concept of clause parallelism for backend worker is close to the
concept of clause shippability introduced in Postgres-XC. In the case of
XC, the equivalent of the master backend is a backend located on a node
called Coordinator that merges and organizes results fetched in parallel
from remote nodes where data scans occur (on nodes called Datanodes). The
backends used for tuple scans across Datanodes share the same data
visibility as they use the same snapshot and transaction ID as the backend
on Coordinator. This is different from the parallelism as there is no idea
of snapshot import to worker backends.

However, the code in XC planner used for clause shippability evaluation is
definitely worth looking at just considering the many similarities it
shares with parallelism when evaluating if a given clause can be executed
on a worker backend or not. It would be a waste to implement twice the same
thing is there is code already available.

> Since functions are user-definable, it's preferable to reason about
> classes of functions.
>
Yes. You are right.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-05-14 23:40:50 Re: Slicing TOAST
Previous Message Simon Riggs 2013-05-14 22:45:58 Re: Slicing TOAST