Skip site navigation (1) Skip section navigation (2)

Writeable CTEs and side effects

From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Writeable CTEs and side effects
Date: 2009-10-07 21:08:03
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
I've made progress in implementing writeable CTEs (repo at
git:// , branch actually_write)
and I've hit a few corner-cases which have lead me to think that we
should be handling DML inside CTEs a bit differently.  Before I go on
implementing this, I'd like to hear your input.

1)	WITH t AS
	(UPDATE foo SET bar = bar+1 RETURNING *)

What's problematic here is that only 1 row is read from the CTE, meaning
also that only one row is updated which, at least how I see it, is not
what we want.  The CTE should only store one row and return that after
it has completely processed the UPDATE statement.

2)	WITH t1 AS
	(UPDATE foo SET bar=bar+1 RETURNING *),
	t2 AS
	(UPDATE foo SET bar=bar+1 RETURNING *)

This is probably not the most common scenario, but is still very
surprising if you for some reason happen to hit it.  Both of the updates
actually have the same transaction ID and command ID, so the rows are
first updated by t1, but when t2 is processed, it looks at the rows and
thinks that it already updated them.

3)	WITH t1 AS
	(UPDATE foo SET bar=bar+1 RETURNING *),
	t2 AS
	(UPDATE baz SET bat=bat+1 RETURNING *)
	VALUES (true);

This isn't probably the most common situation either, but I think it's
worth looking at; the user wants to update two different tables, but
ignore the RETURNING data completely.  On IRC, this has been requested
multiple times.  Even if we wouldn't agree that this feature is useful,
it pretty much follows the semantics of example #1.

Trying to tackle all of these at once, I've come up with this kind of
execution strategy:

Before starting the execution of the main plan tree, for every CTE which
is a DML query, do the following:

   1) Get a new CID
   2a) If there are no references to the CTE (example #3), run the DML
query to the end but ignore the results of the RETURNING query,
   2b) If there are references, run the DML query to the end but store
either as many as rows as you need to to answer the outer query (example
#1) or if we can't determine the number of rows we need (most cases,
example #2) run the query and store all of its results.

Then, if required, get a new CID for the main execution tree and execute
it using the data we now have inside the CTEs.  This way we can avoid
storing useless rows in memory without unexpected behaviour and caveats.

Marko Tiikkaja


pgsql-hackers by date

Next:From: Jaime CasanovaDate: 2009-10-07 21:16:20
Subject: Re: Writeable CTEs and side effects
Previous:From: Tom LaneDate: 2009-10-07 20:58:18
Subject: Re: Issues for named/mixed function notation patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group