Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: jeremy(at)musicsmith(dot)net, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan
Date: 2020-03-22 06:48:14
Message-ID: CAFj8pRBfi7fNtRLsAsCo0Y-mm+X6Y=9Z1ce8bocbWP5MQ65irQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

ne 22. 3. 2020 v 4:23 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > [ $SUBJECT ]
>
> I got around to looking at this today, and what I find is that the
> problem is that exec_stmt_return_query() uses a portal (i.e. a cursor)
> to read the results of the query. That seemed like a good idea, back
> in the late bronze age, because it allowed plpgsql to fetch the query
> results a few rows at a time and not risk blowing out memory with a huge
> SPI result. However, the parallel-query infrastructure refuses to
> parallelize when the query is being read via a cursor.
>
> I think that the latter restriction is probably sane, because we don't
> want to suspend execution of a parallel query while we've got worker
> processes waiting. And there might be some implementation restrictions
> lurking under it too --- that's not a part of the code I know in any
> detail.
>
> However, there's no fundamental reason why exec_stmt_return_query has
> to use a cursor. It's going to run the query to completion immediately
> anyway, and shove all the result rows into a tuplestore. What we lack
> is a way to get the SPI query to pass its results directly to a
> tuplestore, without the SPITupleTable intermediary. (Note that the
> tuplestore can spill a large result to disk, whereas SPITupleTable
> can't do that.)
>
> So, attached is a draft patch to enable that. By getting rid of the
> intermediate SPITupleTable, this should improve the performance of
> RETURN QUERY somewhat even without considering the possibility of
> parallelizing the source query. I've not tried to measure that though.
> I've also not looked for other places that could use this new
> infrastructure, but there may well be some.
>
> One thing I'm not totally pleased about with this is adding another
> SPI interface routine using the old parameter-values API (that is,
> null flags as char ' '/'n'). That was the path of least resistance
> given the other moving parts in pl_exec.c and spi.c, but maybe we
> should try to modernize that before we set it in stone.
>
> Another thing standing between this patch and committability is suitable
> additions to the SPI documentation. But I saw no value in writing that
> before the previous point is settled.
>
> I will go add this to the next commitfest (for v14), but I wonder
> if we should try to squeeze it into v13? This isn't the only
> complaint we've gotten about non-parallelizability of RETURN QUERY.
>

+1

Pavel

> regards, tom lane
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2020-03-22 06:52:17 Re: BUG #16293: postgres segfaults and returns SQLSTATE 08006
Previous Message Tom Lane 2020-03-22 03:23:03 Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-03-22 07:40:33 Re: proposal: schema variables
Previous Message Andres Freund 2020-03-22 05:03:30 Re: kill_prior_tuple and index scan costing