Re: Writeable CTEs and side effects

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-09 20:32:09
Message-ID: 200910092032.n99KW9a19255@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

Allow INSERT/UPDATE/DELETE ... RETURNING in common table expressions

* http://archives.postgresql.org/pgsql-hackers/2009-10/msg00472.php

---------------------------------------------------------------------------

Marko Tiikkaja wrote:
> I've made progress in implementing writeable CTEs (repo at
> git://git.postgresql.org/git/writeable_cte.git , branch actually_write)
> and I've hit a few corner-cases which have lead me to think that we
> should be handling DML inside CTEs a bit differently. Before I go on
> implementing this, I'd like to hear your input.
>
> 1) WITH t AS
> (UPDATE foo SET bar = bar+1 RETURNING *)
> SELECT * FROM t LIMIT 1;
>
> What's problematic here is that only 1 row is read from the CTE, meaning
> also that only one row is updated which, at least how I see it, is not
> what we want. The CTE should only store one row and return that after
> it has completely processed the UPDATE statement.
>
> 2) WITH t1 AS
> (UPDATE foo SET bar=bar+1 RETURNING *),
> t2 AS
> (UPDATE foo SET bar=bar+1 RETURNING *)
> SELECT * FROM t1
> UNION ALL
> SELECT * FROM t2;
>
> This is probably not the most common scenario, but is still very
> surprising if you for some reason happen to hit it. Both of the updates
> actually have the same transaction ID and command ID, so the rows are
> first updated by t1, but when t2 is processed, it looks at the rows and
> thinks that it already updated them.
>
> 3) WITH t1 AS
> (UPDATE foo SET bar=bar+1 RETURNING *),
> t2 AS
> (UPDATE baz SET bat=bat+1 RETURNING *)
> VALUES (true);
>
> This isn't probably the most common situation either, but I think it's
> worth looking at; the user wants to update two different tables, but
> ignore the RETURNING data completely. On IRC, this has been requested
> multiple times. Even if we wouldn't agree that this feature is useful,
> it pretty much follows the semantics of example #1.
>
>
> Trying to tackle all of these at once, I've come up with this kind of
> execution strategy:
>
> Before starting the execution of the main plan tree, for every CTE which
> is a DML query, do the following:
>
> 1) Get a new CID
> 2a) If there are no references to the CTE (example #3), run the DML
> query to the end but ignore the results of the RETURNING query,
> or
> 2b) If there are references, run the DML query to the end but store
> either as many as rows as you need to to answer the outer query (example
> #1) or if we can't determine the number of rows we need (most cases,
> example #2) run the query and store all of its results.
>
> Then, if required, get a new CID for the main execution tree and execute
> it using the data we now have inside the CTEs. This way we can avoid
> storing useless rows in memory without unexpected behaviour and caveats.
>
>
> Regards,
> Marko Tiikkaja
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-10-09 21:04:05 Re: Triggers on columns
Previous Message Andrew Dunstan 2009-10-09 19:04:59 Re: Concurrency testing