Re: Truncate if exists

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Stafford, David x78061" <David(dot)Stafford(at)broadridge(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Truncate if exists
Date: 2012-10-18 19:03:50
Message-ID: CA+TgmoaM0Qb9pHY3p9YgpeTfrS2j_9keMZmP5bibGur5RbBz7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 16, 2012 at 2:12 PM, Stafford, David x78061
<David(dot)Stafford(at)broadridge(dot)com> wrote:
> On Mon, 15 Oct 2012 10:21:18 -0700, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Yeah, I think the functionality that we need is pretty much there
>> already today. What we need to do is to get the syntax to a point
>> where people can write the code they want to write without getting
>> tangled up by it.
>>
>> I think the invention of DO was a big step in the right direction
>> ...
>> With DO, you can write the logic you want
>> as an SQL statement, it's just a clunky and awkward SQL statement. In
>> my view the goal ought to be to refine that mechanism to remove the
>> clunkiness and awkwardness, rather than to invent something completely
>> new.
>
> As someone who has worked with a number of databases now, none of them really get this DDL integration completely right. What I'd like to see is 1) a predicate to easily test things about the schema (does this table, column, index, schema, etc. exist? does it have the right type?) and 2) a way to conditionally execute DDL (and DML, which should fall right out, but it isn't really what this discussion is covering). I would propose extending the current EXISTS / NOT EXISTS predicate as follows:
>
> [NOT] EXISTS TABLE tab [ ( col [type] [ , col [type]]... ) ]
> [NOT] EXISTS COLUMN tab.col [type]
> [NOT] EXISTS INDEX tab.idx [ ( col [ , col]... ) ]
> [NOT] EXISTS CONSTRAINT tab.cstrt -- not sure what else might be useful here
> [NOT] EXISTS ( select ) -- this is the current EXISTS predicate, of course
> [NOT] EXISTS ANY ROWS FROM tab [join] [WHERE predicate] [GROUP BY col [ , col]...] -- exactly the same as
> -- (select 1 FROM etc.)
> -- only because I like
> -- it better
> (the latter [which by no means am I nuts over; it's just that when extending EXISTS I can't stop neatening it up to my personal preferences] could be extended with [NOT] EXISTS MORE THAN n ROWS FROM and [NOT] EXISTS EXACTLY n ROWS FROM.)
>
> There is a new SQL statement: IF predicate true-statement [ELSE false-statement].
>
> To actually execute this new IF statement, the executor would need an IF node that evaluates the predicate (with ANDs and ORs, just like all SQL predicates) and then executes the rest only if the predicate came out TRUE (or NOT FALSE; I forget which is usually used, and the difference with NULL could be useful, as long as it matches other predicates). This moves one more bit of procedural logic into the executor.
>
> Another wrinkle is that the dependent statement might not compile, due to missing tables or whatnot. Actually executing it while it doesn't compile is an error, but we want to defer that error until we actually decide we need to execute it. Also, it's probably good to try compiling it again at that point. So my thought would be to try planning the dependent statement(s); if they compile, hook them to the IF node; if not, hook a DEFERRED node to the IF node. The DEFERRED node has the parse tree (or raw string, whatever makes sense) of the statement; on execution it tries again to plan that statement; if it succeeds, run it; if not, error out.
>
> I'd also add a SEQUENCE node to the executor. It just runs its children in order (could be n-ary, or if fixed arity nodes are what is in the planner/executor today, could be binary, first left, then right, and right could be another SEQUENCE). The DEFERRED node means that a CREATE statement could precede use of what is created in the same sequence and all could get planned (with some deferral to execution time) in advance and run in one lump. This implements DO at the executor level.
>
> The biggest concepts left from plpgsql are looping and variables. Most variables could be modeled as a single row value; SQL already can update a row, so the planning of assignments and calculations of scalars (and arrays, I think) already fits into things the planner knows about. Table variables (which I don't know that plpgsql supports, but someday it should) are less defined. Adding plpgsql's loops to the executor would let whole functions run under one trip through the executor. This is beyond just improving the DDL support for scripts.
>
> I have written a number of database upgrade scripts. Over time we've made them less fragile, by checking for the existence of tables, indexes, and most recently, columns. The usual sequence is:
> 1) check the existence of an index; check that the first few columns are correct; if not, drop the index
> 2) repeat for other indexes that have changed definition over time
> 3) check the existence of the table; create with current layout if it is missing
> 4) check for the presence of a column; if missing, alter the table to add it (of course, we can only add new columns at the end, and occasionally delete a column)
> 5) repeat for more columns
> 6) check the existence of an index; if missing, create it
> 7) repeat for all the indexes
> This is doable in most databases, but pretty messy. You need to join with infoschema tables, or system tables, or use clunky functions to check for existence; checking types is usually pretty horrid. Consequently, we only check a few things and trust that the schema is only in a few different states. A true schema comparator and upgrade solver would be great, but I don't know anyone who has written such a thing. The extended EXISTS predicate that could check tables and indexes would declutter a lot of our upgrade scripts. That's the use case for me.

I agree. I think something like this would be great. But figuring
out how to make it happen is, of course, the trick.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gezeala M. Bacuño II 2012-10-18 19:18:03 Re: BUG #7521: Cannot disable WAL log while using pg_dump
Previous Message Hannu Krosing 2012-10-18 19:03:21 Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility