Re: (select query)/relation as first class citizen

From: Dent John <denty(at)QQdd(dot)eu>
To: Roman Pekar <roma(dot)pekar(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: (select query)/relation as first class citizen
Date: 2019-08-23 08:52:43
Message-ID: 287730C0-6EB6-4AFB-B639-C47774166A8D@QQdd.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 19 Aug 2019, at 15:16, Roman Pekar <roma(dot)pekar(at)gmail(dot)com> wrote:
>
> Hi, John,
>
> I think you've outlined the problem and possible solutions quite well. It's great to see that the goal might be not that far from implementing.
>

Thanks for the prompt, Roman. I meant to have a bit of a play, and your message reminded me.

I was intrigued by the gap in how REFCURSOR is exposed. I’ve made two very rough patches to illustrate a possible solution. Down the line, I am wondering if there is appetite to receive these into core code.

First is a variant of UNNEST that accepts a REFCURSOR, allowing the results to be processed in a normal query, such as SELECT. To illustrate how it works, consider the following:

postgres=# create or replace function test (src_tab text) returns refcursor immutable language plpgsql as $$ begin return refcursor_from_query ('select * from ' || src_tab); end; $$;
CREATE FUNCTION
postgres=# select key, count (value), min (value), max (value) from unnest (array['test1', 'test2', 'test3']) tab, lateral unnest (test (tab.tab)) as (key text, value numeric) group by key;
key | count | min | max
--------+-------+-----+-----
ITEM_A | 100 | 0 | 99
ITEM_C | 50 | -50 | -1
ITEM_B | 200 | 0 | 199
(3 rows)

postgres=# explain select key, count (value), min (value), max (value) from unnest (array['test1', 'test2', 'test3']) tab, lateral unnest (test (tab.tab)) as (key text, value numeric) group by key;
psql: WARNING: cache reference leak: cache pg_proc (43), tuple 11/9 has count 1
QUERY PLAN
----------------------------------------------------------------------------
HashAggregate (cost=9.29..10.29 rows=100 width=104)
Group Key: unnest.key
-> Nested Loop (cost=0.26..6.29 rows=300 width=64)
-> Function Scan on unnest tab (cost=0.00..0.03 rows=3 width=32)
-> Function Scan on unnest (cost=0.25..1.25 rows=100 width=64)
(5 rows)

The example requires the following setup:

postgres=# create table test1 (key text, value numeric);
CREATE TABLE
postgres=# insert into test1 select 'ITEM_A', generate_series (0, 99);
INSERT 0 100
postgres=# create table test2 (key text, value numeric);
CREATE TABLE
postgres=# insert into test2 select 'ITEM_B', generate_series (0, 199);
INSERT 0 200
postgres=# create table test3 (key text, value numeric);
CREATE TABLE
postgres=# insert into test1 select 'ITEM_C', generate_series (-50, -1);
INSERT 0 50
postgres=# create or replace function refcursor_from_query (qry text) returns refcursor immutable language plpgsql as $$ declare cur refcursor; begin open cur for execute qry; return cur; end; $$;
CREATE FUNCTION

Obviously this kind of construction is open to wide variety of attacks, and a more realistic example would need to defend against inappropriate input.

My code is really really rough, and also yields a WARNING about cache leaks — which obviously needs fixing — but demonstrates the point. This variant is provided in unnest-refcursor.patch.

The example is pretty contrived, but I think there is general utility in having a way of processing output from REFCURSORs. Arguably, as I mentioned in my previous comments, this overlaps the plpgsql’s RETURN QUERY capability. However, unlike RETURN QUERY, the result set is not materialised before returning.

It is also interesting that a RECORD-returning UNNEST requires the row type to be declared explicitly (hence the as (key text, value numeric) clause). This seems a less than ideal syntax, but I’m not sure there is much alternative.

Second is another variant of UNNEST which attempts to retrieve the query that the REFCURSOR is OPEN’ed for, and inlines the SQL text into the query being planned. (Exactly as may be done with the text of certain SQL language FUNCTIONs.)

Again, an example probably best illustrates what is going on:

postgres=# explain select key, count (value), min (value), max (value) from unnest (test ('test1')) as (key text, value numeric) where value > 50 group by key;
QUERY PLAN
-------------------------------------------------------------
HashAggregate (cost=3.37..3.39 rows=2 width=79)
Group Key: test1.key
-> Seq Scan on test1 (cost=0.00..2.88 rows=49 width=11)
Filter: (value > '50'::numeric)
(4 rows)

postgres=# select key, count (value), min (value), max (value) from unnest (test ('test1')) as (key text, value numeric) where value > 50 group by key;
key | count | min | max
--------+-------+-----+-----
ITEM_A | 49 | 51 | 99
(1 row)

There are two interesting things going on here. First is that the query returned by test() is rewritten and consumed as if it were verbatim in the query text. Second is that the outer filter (value > 50) can now be pushed down by the planner, potentially yielding a much more efficient plan.

This variant is provided in unnest-rewrite-refcursor.patch.

My code here is even rougher than the first. I stopped short of creating a new ’state’ for REFCURSOR that is not yet ‘OPEN’ but allows the query text to be yielded. This ultimately turned out quite non-trivial for a POC. (It does, though, seem ultimately feasible.) The consequence is that the REFCURSOR query is planned before being returned, only for that plan to be ignored and the query text consumed into the outer plan, thus being re-planned. In any significant use, this would be a huge inefficiency, and a shortcoming that would need to be addressed.

Both patches should apply against 12beta2. unnest-refcursor should be applied first, as the second depends upon its foundation.

I’m wondering what do you think of the concept, Roman and Pavel?

denty.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-08-23 09:05:30 Re: Why overhead of SPI is so large?
Previous Message Masahiko Sawada 2019-08-23 08:44:01 Re: Comment in ginpostinglist.c doesn't match code