Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan; that is, it must be constructed so that each process which executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes. Generally, this means that the scan on the driving table of the query must be a parallel-aware scan.
The following types of parallel-aware table scans are currently supported.
In a parallel sequential scan, the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential.
In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not.
In a parallel index scan or parallel index-only scan, the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other process can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process.
Other scan types, such as scans of non-btree indexes, may support parallel scans in the future.
Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. For example, if a nested loop join is chosen, the inner plan may be an index scan which looks up a value taken from the outer side of the join.
Each worker will execute the inner side of the join in full. This is typically not a problem for nested loops, but may be inefficient for cases involving hash or merge joins. For example, for a hash join, this restriction means that an identical hash table is built in each worker process, which works fine for joins against small tables but may not be efficient when the inner table is large. For a merge join, it might mean that each worker performs a separate sort of the inner relation, which could be slow. Of course, in cases where a parallel plan of this type would be inefficient, the query planner will normally choose some other plan (possibly one which does not use parallelism) instead.
PostgreSQL supports parallel
aggregation by aggregating in two stages. First, each process
participating in the parallel portion of the query performs an
aggregation step, producing a partial result for each group of
which that process is aware. This is reflected in the plan as a
Partial Aggregate node. Second, the
partial results are transferred to the leader via
Merge. Finally, the leader re-aggregates the results across
all workers in order to produce the final result. This is reflected
in the plan as a
Finalize Aggregate node
runs on the leader process, queries which produce a relatively
large number of groups in comparison to the number of input rows
will appear less favorable to the query planner. For example, in
the worst-case scenario the number of groups seen by the
Finalize Aggregate node could be as
many as the number of input rows which were seen by all worker
processes in the
stage. For such cases, there is clearly going to be no performance
benefit to using parallel aggregation. The query planner takes this
into account during the planning process and is unlikely to choose
parallel aggregate in this scenario.
Parallel aggregation is not supported in all situations. Each
aggregate must be safe for parallelism and
must have a combine function. If the aggregate has a transition
state of type
internal, it must have
serialization and deserialization functions. See CREATE
AGGREGATE for more details. Parallel aggregation is not
supported if any aggregate function call contains
clause and is also not supported for ordered set aggregates or when
the query involves
GROUPING SETS. It
can only be used when all joins involved in the query are also part
of the parallel portion of the plan.
If a query that is expected to do so does not produce a parallel plan, you can try reducing parallel_setup_cost or parallel_tuple_cost. Of course, this plan may turn out to be slower than the serial plan which the planner preferred, but this will not always be the case. If you don't get a parallel plan even with very small values of these settings (e.g. after setting them both to zero), there may be some reason why the query planner is unable to generate a parallel plan for your query. See Section 15.2 and Section 15.4 for information on why this may be the case.
When executing a parallel plan, you can use
EXPLAIN (ANALYZE, VERBOSE) to display per-worker
statistics for each plan node. This may be useful in determining
whether the work is being evenly distributed between all plan nodes
and more generally in understanding the performance characteristics
of the plan.
If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.