Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jeremy(at)musicsmith(dot)net
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan
Date: 2020-03-22 03:23:03
Message-ID: 1741.1584847383@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> [ $SUBJECT ]

I got around to looking at this today, and what I find is that the
problem is that exec_stmt_return_query() uses a portal (i.e. a cursor)
to read the results of the query. That seemed like a good idea, back
in the late bronze age, because it allowed plpgsql to fetch the query
results a few rows at a time and not risk blowing out memory with a huge
SPI result. However, the parallel-query infrastructure refuses to
parallelize when the query is being read via a cursor.

I think that the latter restriction is probably sane, because we don't
want to suspend execution of a parallel query while we've got worker
processes waiting. And there might be some implementation restrictions
lurking under it too --- that's not a part of the code I know in any
detail.

However, there's no fundamental reason why exec_stmt_return_query has
to use a cursor. It's going to run the query to completion immediately
anyway, and shove all the result rows into a tuplestore. What we lack
is a way to get the SPI query to pass its results directly to a
tuplestore, without the SPITupleTable intermediary. (Note that the
tuplestore can spill a large result to disk, whereas SPITupleTable
can't do that.)

So, attached is a draft patch to enable that. By getting rid of the
intermediate SPITupleTable, this should improve the performance of
RETURN QUERY somewhat even without considering the possibility of
parallelizing the source query. I've not tried to measure that though.
I've also not looked for other places that could use this new
infrastructure, but there may well be some.

One thing I'm not totally pleased about with this is adding another
SPI interface routine using the old parameter-values API (that is,
null flags as char ' '/'n'). That was the path of least resistance
given the other moving parts in pl_exec.c and spi.c, but maybe we
should try to modernize that before we set it in stone.

Another thing standing between this patch and committability is suitable
additions to the SPI documentation. But I saw no value in writing that
before the previous point is settled.

I will go add this to the next commitfest (for v14), but I wonder
if we should try to squeeze it into v13? This isn't the only
complaint we've gotten about non-parallelizability of RETURN QUERY.

regards, tom lane

Attachment Content-Type Size
plpgsql-return-query-results-directly-1.patch text/x-diff 19.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2020-03-22 06:48:14 Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan
Previous Message David G. Johnston 2020-03-20 20:09:23 Re: BUG #16309: Postgres's ISO 8601 date output is not compliant

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-03-22 04:53:05 Re: kill_prior_tuple and index scan costing
Previous Message Andres Freund 2020-03-22 02:33:02 Re: kill_prior_tuple and index scan costing