Re: [8.4] Updated WITH clause patch (non-recursive)

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Neil Conway" <neilc(at)samurai(dot)com>
Cc: "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [8.4] Updated WITH clause patch (non-recursive)
Date: 2008-01-30 16:04:48
Message-ID: 87odb3b8pr.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

"Neil Conway" <neilc(at)samurai(dot)com> writes:

> Remaining work is to review the guts of the patch (which shouldn't take
> long), and write documentation and regression tests. I'm personally
> hoping to see this get into the tree fairly early in the 8.4 cycle,
> pending discussion of course.

Looking back at this I've realized (putting aside whether we want to apply the
patch as is which is another question) that to get the CTEs materialized so
they perform the way a user might expect them to would actually require the
same infrastructure that recursive queries will require.

Basically what I think we really want down the line is for something like:

WITH (select * from complex_view) AS x
SELECT *
FROM x
JOIN x as x2 ON (x.id=x2.id2)

to run the view once, materialize the results and then join the resulting data
with itself. At least that's what the user is likely expecting. Now it may be
that we have a better plan by inlining the two calls which in an ideal world
we would go ahead and try as well. But it's more likely that users would write
the WITH clause because they specifically want to avoid re-evaluating a
complex subquery.

To do this though we would need the same capability that recursive queries
would need. Namely the ability to have a single tuplestore with multiple
readers reading from different positions in the tuplestore.

So what I'm imagining doing is to add a flag to the RelOptInfo (Alternatively
we could create a new rtekind, RTE_CTE, but that would require most sites
which check for RTE_SUBQUERY to check for that as well).

Then (I think) in create_subqueryscan_plan we would have to check for this
flag and introduce the Memoize node I previously mentioned. That's basically a
Materialize node which keeps track of its position within the tuplestore in
its own state. It would also have to introuduce the one-time node with the
Materialize node which the Memoize would point to. I'm getting a bit vague
here as I haven't entirely absorbed how one-time plans work.

That would allow the query above to, for example, generate something like:

InitPlan
-> Memoize x (cost=0.00..34.00 rows=2400 width=4)
-> Seq scan on complex_view (cost=0.00..34.00 rows=2400 width=4)
Merge Join (cost=337.50..781.50 rows=28800 width=8)
Merge Cond: (x.id = x2.id)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: x.id
-> MemoizeRead x (cost=0.00..34.00 rows=2400 width=4)
-> Sort (cost=168.75..174.75 rows=2400 width=4)
Sort Key: x2.id
-> MemoizeRead x x2 (cost=0.00..34.00 rows=2400 width=4)

Does this sound like the right track? Should I be doing this at the RelOptInfo
level or at some point higher up? Do I misunderstand anything about how
InitPlan is handled?

Other ideas: it might be interesting to note that we're sorting the same
Memoize node twice and push that down into the initial plan. Or somehow to
check whether it wouldn't be faster to just inline the memoized node directly
because perhaps there's a path available which would work for this read of it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2008-01-30 16:36:33 Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable
Previous Message Alvaro Herrera 2008-01-30 16:00:37 Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable