Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
Date: 2018-01-17 19:29:05
Message-ID: CA+TgmobXWEqRUr0k1RS=x4NaAbi+9R-0z+pp+VEuLYKorjbbYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2018 at 8:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> At present, one major use of Cursors is in postgres_fdw.
>
> In that usage, the query executes and returns all the rows. No other
> side execution is possible.

True, although foreign tables using postgres_fdw can't be scanned in
parallel for other reasons anyway -- it's the user backend, not any
workers we might spin up, that has the connection to the remote
server. Also, I'm talking about the external-facing use of cursors by
users, not the fact that postgres_fdw uses them internally to talk to
other machines.

> How do we make parallel query work for Cursors, if not by Tomas' proposal?
>
> What more restrictive proposal would you prefer?

In all honestly, if I knew how to fix this problem with a reasonable
amount of work, I would have done it already. It's a very hard
problem.

One idea I've thought about is providing some kind of infrastructure
for workers to detach from a parallel query. This could be useful in
a variety of situations, including cursors. Let's say the leader is
either (a) suspending execution of the query, because it's a cursor,
or (b not able to absorb rows as fast as workers are generating them.
In the former situation, we'd like to get rid of all workers; in the
latter situation, some workers. In the former situation, getting all
workers to shut down cleanly would let us exit parallel mode (and
perhaps re-enter it later when we resume execution of the query). In
the latter situation, we could avoid wasting workers on queries where
the leader can't keep up so that those worker slots are available to
other queries that can benefit from them.

However, this is not simple. In the case of a parallel sequential
scan, once a worker claims a page, it must scan all the tuples on that
page. No other backend will ever get that page, and therefore if the
backend that did claim it doesn't scan the whole thing, the query may
return the wrong answer. Every other parallel-aware executor node we
have has similar problems: there are points where we could safely stop
without changing the final answer to the query, and other points where
it is not safe to stop. One idea is to provide a new callback for
parallel-aware executor nodes that tells them to stop returning tuples
at the next safe stop point. When we want to get rid of workers, we
somehow signal them to invoke this method on every node in the plan
tree; at some point, they'll drop off but the query will still return
the right answer. In the worst case, however, the worker might emit
an arbitrarily large number of tuples before reaching a safe stop
point and exiting (imagine a parallel-sequential scan cross-joined to
generate_series(1, 100000000000) or the equivalent), which seems
pretty crappy.

Or we could go the other way and try to keep the workers running. I
don't really like that because it ties down those workers for
potentially a long period of time, but that might be acceptable for
some users. The main implementation problem is that somehow we'd need
to propagate to them an updated version of any state that has changed
while the query was suspended, such as new combo CIDs that have been
created in the meantime. dshash seems like a useful tool toward such
a goal, but there are details to work out, and there are similar
problems with everything else that is copied from leader to workers.
We could possibly prevent these problems from arising by placing
draconian restrictions on what a backend is allowed to do while a
parallel cursor is open, such as in your follow-on proposal to lock
out everything except FETCH. I'm not really that excited about such a
thing because it's extremely limiting and still doesn't solve all the
problems: in particular, after BEGIN ... DECLARE CURSOR PARALLEL ...
FETCH ... FETCH ... syntax error, there is going to be trouble around
the state of group locking. It will be very bad if the workers think
the transaction is still alive and the leader thinks it is in a new
transaction and they're all sharing locks. You also have to worry
about what things can be accomplished by protocol messages, not just
what can be done via SQL. But it's probably feasible with enough
work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-17 19:46:00 Re: [HACKERS] Useless code in ExecInitModifyTable
Previous Message Sergei Kornilov 2018-01-17 19:26:15 Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.