Re: The flinfo->fn_extra question, from me this time.

From: Dent John <denty(at)QQdd(dot)eu>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chapman Flack <chap(at)anastigmatix(dot)net>
Subject: Re: The flinfo->fn_extra question, from me this time.
Date: 2019-11-02 22:42:56
Message-ID: DB853715-9906-4E11-A183-E0DEE4840751@QQdd.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests failed with my patch. No idea if anyone spotted that and withheld reply in revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), rngfunc_mat(10+r,13) );
r | i | s | i | s
---+----+---+----+—
1 | 11 | 1 | 11 | 1
1 | | | 12 | 2
1 | | | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
2 | | | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
(6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes s as nextval(). The patch currently prefers to re-execute the function in place of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way of avoiding change for folk that have gotten used to the current behaviour without doing that. So I’m tempted to pipeline the rows back from a function (if it returns ValuePerCall), and also record it in a tuplestore, just in case rescan happens. There’s still wastage in this approach, but it would save the current behaviour, while stil enabling the early abort of ValuePerCall SRFs at relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the tuplestore.

Earlier, I suggested that we might make the decision logic prefer to materialise a tuplestore for VOLATILE functions, and prefer to pipeline directly from STABLE (and IMMUTABLE) functions. The docs on volatility categories describe that the optimiser will evaluate a VOLATILE function for every row it is needed, whereas it might cache STABLE and IMMUTABLE with greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et al., is the right thing to do. I wonder if we could invent a new modifier to CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] MATERIALIZED clause. For example, the following would achieve the above proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED )

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on what the outer plan includes. I guess we can tell if we’re part of a join, or if the plan requires the ability to scan backwards. Could that work?

denty.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2019-11-02 22:57:06 Re: Make StringInfo available to frontend code.
Previous Message Steven Winfield 2019-11-02 22:00:30 RE: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes