PATCH: enabling parallel execution for cursors explicitly (experimental)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: PATCH: enabling parallel execution for cursors explicitly (experimental)
Date: 2017-10-14 18:14:19
Message-ID: 98760523-a0d2-8705-38e3-c4602ecf2448@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

One of the existing limitations of parallel query is that cursors
generally do not benefit from it [1]. Commit 61c2e1a95f [2] improved the
situation for cursors from procedural languages, but unfortunately for
user-defined cursors parallelism is still disabled.

For many use cases that is perfectly fine, but for applications that
need to process large amounts of data this is rather annoying. When the
result sets are large, cursors are extremely efficient - in terms of
memory consumption, for example. So the applications have to choose
between "cursor" approach (and no parallelism), or parallelism and
uncomfortably large result sets.

I believe there are two main reasons why parallelism is disabled for
user-defined cursors (or queries that might get suspended):

(1) We can't predict what will happen while the query is suspended (and
the transaction is still in "parallel mode"), e.g. the user might run
arbitrary DML which is not allowed.

(2) If the cursor gets suspended, the parallel workers would be still
assigned to it and could not be used for anything else.

Clearly, we can't solve those issues in general, so the default will
probably remain "parallelism disabled".

I propose is to add a new cursor option (PARALLEL), which would allow
parallel plans for that particular user-defined cursor. Attached is an
experimental patch doing this (I'm sure there are some loose ends).

This does not make either any of the issues go away, of course. We still
enforce "no DML while parallel operation in progress" as before, so this
will not work:

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
INSERT INTO t2 VALUES (1);
FETCH 1000 FROM x;
COMMIT;

but this will

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
...
FETCH 1000 FROM x;
CLOSE x;
INSERT INTO t2 VALUES (1);
COMMIT;

Regarding (2), if the user suspends the cursor for a long time, bummer.
The parallel workers will remain assigned, doing nothing. I don't have
any idea how to get around that, but I don't see how we could do better.
I don't see either of these limitations as fatal.

Any opinions / obvious flaws that I missed?

regards

[1]
https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html

[2]
https://www.postgresql.org/message-id/CAOGQiiMfJ%2B4SQwgG%3D6CVHWoisiU0%2B7jtXSuiyXBM3y%3DA%3DeJzmg%40mail.gmail.com

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
parallel-cursors-v1.patch text/x-patch 5.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gourav Kumar 2017-10-14 19:57:03 Turn off transitive joins
Previous Message Tom Lane 2017-10-14 18:04:17 Re: pg_control_recovery() return value when not in recovery