Re: Parallel Sort

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Noah Misch'" <noah(at)leadboat(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Sort
Date: 2013-05-15 06:56:52
Message-ID: 005101ce5139$61ff7380$25fe5a80$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, May 13, 2013 7:59 PM Noah Misch wrote:
> It would be great if one client session could take advantage of
> multiple CPU
> cores. EnterpriseDB wishes to start the trek into this problem space
> for 9.4
> by implementing parallel internal (i.e. not spilling to disk) sort.
> This
> touches on a notable subset of the infrastructure components we'll need
> for
> parallel general query. My intent is to map out the key design topics,
> hear
> about critical topics I hadn't considered, and solicit feedback on the
> quality
> of the high-level plan. Full designs for key pieces will come later.
>
>
> * Worker Backends
>
> A multi-process model, rather than a multi-thread, is best for
> introducing
> parallelism to a PostgreSQL session. With threading, all in-memory
> state
> would become shared by default; with processes, in-memory data
> structures
> remain unshared until we explicitly select them for sharing. Some data
> structures will require new sharing, but I expect unshared ones to
> remain the
> typical case. I refer to these additional processes as worker
> backends. They
> will have much in common with ordinary client backends, including a
> database
> connection and a PGPROC. They will have no client socket; instead,
> they will
> take direction from the client-connected backend, termed the master,
> via
> shared memory.

> We can allocate a small amount of permanent shared memory for
> coordination
> among a group of processes, but sorting will benefit from a region as
> large as
> maintenance_work_mem. Expect on-demand memory sharing.

Will the shared memory used for coordinating tuples between master and
worker be fixed or varying depending on size of tuples to be sorted or
number of workers associated.
If it is varying, then it can sometimes encounter situation where required
memory is not available and in that case it has to revert to serial sorting

>
> * Identifying Parallel-Compatible Functions
>
> Not all functions can reasonably run on a worker backend. We should
> not
> presume that a VOLATILE function can tolerate the unstable execution
> order
> imposed by parallelism, though a function like clock_timestamp() is
> perfectly
> reasonable to run that way. STABLE does not have that problem, but
> neither
> does it constitute a promise that the function implementation is
> compatible
> with parallel execution. Consider xid_age(), which would need code
> changes to
> operate correctly in parallel. IMMUTABLE almost guarantees enough;
> there may
> come a day when all IMMUTABLE functions can be presumed parallel-safe.
> For
> now, an IMMUTABLE function could cause trouble by starting a (read-
> only)
> subtransaction. The bottom line is that parallel-compatibility needs
> to be
> separate from volatility classes for the time being.
>
> I'm not sure what the specific answer here should look like. Simply
> having a
> CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying,
> because
> the rules are liable to loosen over time.
>
>
> * Planner & Similar Issues
>
> We should decide whether to actually sort in parallel based on the
> comparator
> cost and the data size. The system currently has no information on
> comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have
> procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp. Let's improve
> the
> procost estimates of all core B-tree and hash operators. This will
> have other
> benefits, but we will need to be cognizant of the risk of upsetting
> setups
> that have tuned cpu_operator_cost based on the present situation.
>
> The choice of whether to parallelize can probably be made a manner
> similar to
> the choice to do an external sort: the planner guesses the outcome for
> costing
> purposes, but the actual decision is made at execution time. The
> planner
> would determine a tuple count cutoff at which parallelism becomes
> favorable,
> and tuplesort would check that to establish its actual decision.
>
> Use of parallelism within one process has a distributed effect on the
> system,
> similar to the use of work_mem. Add a PGC_USERSET GUC representing the
> number
> of worker processes the current session is willing to use. It would
> default
> to a small number, say zero or two. On a throughput-oriented system
> with high
> concurrent query counts, the DBA would tend to set/leave it at zero in
> postgresql.conf; parallelism can only help if the system has free
> resources.
> Separately, add a GUC limiting the total number of workers across the
> system
> at any one time.

How will the parallel sorting tasks be divided and assigned to each worker?

With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Golub 2013-05-15 09:01:51 Re: Slicing TOAST
Previous Message Christoph Berg 2013-05-15 06:42:02 plperl segfault in plperl_trusted_init() on kfreebsd