Re: deparsing utility commands

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: deparsing utility commands
Date: 2015-03-19 04:40:21
Message-ID: 20150319044021.GD29780@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Alvaro Herrera (alvherre(at)2ndquadrant(dot)com) wrote:
> One thing that Stephen commented on was the ALTER TABLE preparatory
> patch. He said why not return ObjectAddress from the subcommand
> routines instead of just Oid/attnum? The reason is that to interpret
> the address correctly you will still need a lot more context; the OID
> and attnum are only part of the truth anyway. I think there are a small
> number of cases where we could meaningfully return an ObjectAddress and
> have the whole thing be useful, but I'm not sure it's worth the bother.

Alright, fair enough.

> In patch 0004, I removed most of the Stash() calls in ProcessUtility,
> instead adding one at the bottom that takes care of most of the simple
> cases. That means that a developer adding support for something new in
> ProcessUtilitySlow without realizing that something must be added to the
> command stash will get an error fairly early, which I think is helpful.

Right, I like this.

> Patch 0021 (fixing a bug in SECURITY LABEL support) I'm not really sure
> about. I don't like modifying a parse node during execution -- seems a
> bad habit. It seems better to me to return the chosen security label as
> an ObjectAddress in ExecSecLabelStmt, as pass that as "secondaryOid" to
> the deparse_utility.c routine.

I agree, changing a parse node during execution definitely seems like a
bad idea, particularly if there's a way you could avoid doing so, which
it sounds like there is. Any reason to not follow up on that?

> In patch 0023 (CREATE/ALTER POLICY support) I ran into trouble. I
> represent the role in the json like this:
> tmp = new_objtree_VA("TO %{role:, }I", 0);
> which means that role names are quoted as identifiers. This means it
> doesn't work as soon as we get a command referencing PUBLIC (which many
> in the regression test do, because when the TO clause is omitted, PUBLIC
> is the default). So this ends up as "PUBLIC" and everything goes
> downhill. I'm not sure what to do about this, except to invent a new
> %{} code.

How is this any different from the GRANT/REVOKE cases..? Surely you
have to deal with 'public' being special there also.

> Subject: [PATCH 01/29] add secondaryOid to DefineTSConfiguration()

This looks fine to me.

> Subject: [PATCH 02/29] deparse/core: have ALTER TABLE return table OID and
> other data
[...]
> diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
> index 5ce4395..9675d21 100644
> @@ -1853,7 +1853,7 @@ heap_drop_with_catalog(Oid relid)
> /*
> * Store a default expression for column attnum of relation rel.
> */
> -void
> +Oid
> StoreAttrDefault(Relation rel, AttrNumber attnum,
> Node *expr, bool is_internal)
> {

Missing a comment about the return value?

> @@ -2074,7 +2083,7 @@ StoreConstraints(Relation rel, List *cooked_constraints, bool is_internal)
> int numchecks = 0;
> ListCell *lc;
>
> - if (!cooked_constraints)
> + if (list_length(cooked_constraints) == 0)
> return; /* nothing to do */
>
> /*

While this is used in a few places, it's by far more common to use:

if (cooked_constraints == NIL)

or to keep it the way it is..

> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
> index 623e6bf..fc2c8a9 100644
> @@ -3416,78 +3418,95 @@ static void
> case AT_DropColumn: /* DROP COLUMN */
> ATExecDropColumn(wqueue, rel, cmd->name,
> - cmd->behavior, false, false, cmd->missing_ok, lockmode);
> + cmd->behavior, false, false,
> + cmd->missing_ok, lockmode);
> break;
> case AT_DropColumnRecurse: /* DROP COLUMN with recursion */
> ATExecDropColumn(wqueue, rel, cmd->name,
> - cmd->behavior, true, false, cmd->missing_ok, lockmode);
> + cmd->behavior, true, false,
> + cmd->missing_ok, lockmode);
> break;

[...]

> case AT_ReplicaIdentity:
> - ATExecReplicaIdentity(rel, (ReplicaIdentityStmt *) cmd->def, lockmode);
> + ATExecReplicaIdentity(rel, (ReplicaIdentityStmt *) cmd->def,
> + lockmode);
> break;
> case AT_EnableRowSecurity:
> ATExecEnableRowSecurity(rel);

I realize the whole thing is changing anyway, but the random whitespace
changes don't make reviewing any easier.

> -static void
> +static AttrNumber
> ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
> ColumnDef *colDef, bool isOid,
> bool recurse, bool recursing, LOCKMODE lockmode)
> @@ -4698,7 +4720,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
> colDef->colname, RelationGetRelationName(rel))));
>
> heap_close(attrdesc, RowExclusiveLock);
> - return;
> + return InvalidAttrNumber;
> }
> }
>
> @@ -4944,6 +4966,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
>
> heap_close(childrel, NoLock);
> }
> +
> + return newattnum;
> }

This doesn't need to do anything for inheiriting children..? I suppose
perhaps not, but might be useful to add a comment noting that only the
attnum for the parent table is returned.

> @@ -5058,7 +5082,7 @@ ATPrepAddOids(List **wqueue, Relation rel, bool recurse, AlterTableCmd *cmd, LOC
> /*
> * ALTER TABLE ALTER COLUMN DROP NOT NULL
> */
> -static void
> +static AttrNumber
> ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
> {
> HeapTuple tuple;
> @@ -5143,18 +5167,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
>
> /* keep the system catalog indexes current */
> CatalogUpdateIndexes(attr_rel, tuple);
> +
> + /* XXX should we return InvalidAttrNumber if there was no change? */
> }

I don't think I commented on this the first time through, but on
reflection, I'm inclined to say "yes" that InvalidAttrNumber should be
returned if there was no change.

Comments about what's being returned wouldn't hurt either.

> /*
> * ALTER TABLE ALTER COLUMN SET NOT NULL
> */
> -static void
> +static AttrNumber
> ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
> const char *colName, LOCKMODE lockmode)
> {
> @@ -5198,18 +5226,22 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
>
> /* Tell Phase 3 it needs to test the constraint */
> tab->new_notnull = true;
> +
> + /* XXX should we return InvalidAttrNumber if there was no change? */
> }

ditto.

> *
> * Subroutine for ATExecAddConstraint.
> *
> @@ -5914,7 +5975,7 @@ ATExecAddConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
> * "is_readd" flag for that; just setting recurse=false would result in an
> * error if there are child tables.
> */
> -static void
> +static Oid
> ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
> Constraint *constr, bool recurse, bool recursing,
> bool is_readd, LOCKMODE lockmode)
> @@ -5923,6 +5984,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
> ListCell *lcon;
> List *children;
> ListCell *child;
> + Oid constrOid = InvalidOid;
>
> /* At top level, permission check was done in ATPrepCmd, else do it */
> if (recursing)
> @@ -5965,6 +6027,13 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
> /* Save the actually assigned name if it was defaulted */
> if (constr->conname == NULL)
> constr->conname = ccon->name;
> +
> + /*
> + * Save our return value. Note we don't expect more than one element in
> + * this list.
> + */
> + Assert(constrOid == InvalidOid);
> + constrOid = ccon->conoid;
> }

I feel like this Assert() should really be outside of this, like so:

Assert(list_length(newcons) <= 1);

With appropriate comment that we should only ever have a list of 0 or 1
being returned from this call to AddRelationNewConstraints.

Further, it might be better as just an if() instead of a foreach, since
we aren't actually going to be ever looping here.

I further wonder if this whole thing could be simplified by moving the
CommandCounterIncrement(), if (newcons == NIL) and is_no_inherit checks
up to right under the call to AddRelationNewConstraints and then we
don't need a conditional or a foreach.

> @@ -6573,7 +6652,7 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
> * no need to lock children in that case, yet we wouldn't be able to avoid
> * doing so at that level.
> */
> -static void
> +static Oid
> ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
> bool recursing, LOCKMODE lockmode)
> {
> @@ -6583,6 +6662,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
> HeapTuple tuple;
> Form_pg_constraint con = NULL;
> bool found = false;
> + Oid constrOid = InvalidOid;
>
> conrel = heap_open(ConstraintRelationId, RowExclusiveLock);
>
> @@ -6624,9 +6704,10 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
> HeapTuple copyTuple;
> Form_pg_constraint copy_con;
>
> + constrOid = HeapTupleGetOid(tuple);
> +
> if (con->contype == CONSTRAINT_FOREIGN)
> {
> - Oid conid = HeapTupleGetOid(tuple);
> Relation refrel;
>
> /*
> @@ -6641,7 +6722,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
>
> validateForeignKeyConstraint(constrName, rel, refrel,
> con->conindid,
> - conid);
> + constrOid);
> heap_close(refrel, NoLock);
>
> /*
> @@ -6722,6 +6803,8 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
> systable_endscan(scan);
>
> heap_close(conrel, RowExclusiveLock);
> +
> + return constrOid;
> }

This whole thing would be nicer if we inverted the second half to check
if (con->convalidated) { ... cleanup and return ... } ... continue, imv.

Regardless, a comment about how this will return InvalidOid if nothing
is done (because the constraint has already been validated) would be
good.

I thought the rest looked fine.

> Subject: [PATCH 03/29] deparse: infrastructure needed for command deparsing

> diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c

> +void
> +EventTriggerStashCommand(ObjectAddress address, ObjectAddress *secondaryObject,
> + Node *parsetree)

Why pass secondaryObject in as a pointer here?

> +{
> + MemoryContext oldcxt;
> + StashedCommand *stashed;
> +
> + if (currentEventTriggerState->commandCollectionInhibited)
> + return;
> +
> + oldcxt = MemoryContextSwitchTo(currentEventTriggerState->cxt);
> +
> + stashed = palloc(sizeof(StashedCommand));
> +
> + stashed->type = SCT_Simple;
> + stashed->in_extension = creating_extension;
> +
> + stashed->d.simple.address = address;
> + stashed->d.simple.secondaryObject =
> + secondaryObject ? *secondaryObject : InvalidObjectAddress;

Per above, this could just be the same as 'address' if it wasn't a
pointer being passed in but rather either a valid object or
InvalidObjectAddress. Perhaps a helper function could be used if that
seems awkward, which only takes a single address and then calls this
function with the secondaryObject set to InvalidObjectAddress?

> +Datum
> +pg_event_trigger_get_creation_commands(PG_FUNCTION_ARGS)

I agree with the earlier comment, seems like this should get a different
name.. pg_event_trigger_get_commands()?

> + /*
> + * For IF NOT EXISTS commands that attempt to create an existing
> + * object, the returned OID is Invalid; in those cases, return an empty
> + * command instead of trying to soldier on.

That's not quite accurate, is it? We don't return an empty command, we
simply don't return anything, right?

> + command = deparse_utility_command(cmd);
> +
> + /*
> + * Some parse trees return NULL when deparse is attempted; we don't
> + * emit anything for them.
> + */
> + if (command != NULL)
> + {

Couldn't this be flipped around to be if (!command) continue; instead
then, similar to earlier where we got an InvalidId returned?

> diff --git a/src/backend/tcop/deparse_utility.c b/src/backend/tcop/deparse_utility.c

Is it just me, or does this feel a bit odd to have in backend/tcop/?

> @@ -0,0 +1,992 @@
> +/*-------------------------------------------------------------------------
> + *
> + * deparse_utility.c
> + * Functions to convert utility commands to machine-parseable
> + * representation
> + *
> + * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group

Pretty sure this should be up to 2015. :)

> +#ifdef NOT_USED
> +static void append_integer_object(ObjTree *tree, char *name, int64 value);
> +#endif

...?

> +/*
> + * Given a utility command parsetree and the OID of the corresponding object,
> + * return a JSON representation of the command.
[...]
> +char *
> +deparse_utility_command(StashedCommand *cmd)

That's not quite right now, is it? It's the parsetree and the
ObjectAddress. It'd also be good to point out that those are what's in
that StashedCommand object.

> Subject: [PATCH 04/29] deparse: add EventTriggerStashCommand() calls to
> ProcessUtilitySlow

No particular complaints here.

> Subject: [PATCH 05/29] deparse: Support CREATE
> SCHEMA/TABLE/SEQUENCE/INDEX/TRIGGER/TYPE AS

> diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c

> +static char *
> +get_persistence_str(char persistence)
> +{
> + switch (persistence)
> + {
> + case RELPERSISTENCE_TEMP:
> + return "TEMPORARY";
> + case RELPERSISTENCE_UNLOGGED:
> + return "UNLOGGED";
> + case RELPERSISTENCE_PERMANENT:
> + return "";
> + default:
> + return "???";
> + }
> +}

??? strikes me as less than ideal? Should we ereport() there?

> @@ -977,6 +987,8 @@ pg_get_indexdef_columns(Oid indexrelid, bool pretty)
> *
> * This is now used for exclusion constraints as well: if excludeOps is not
> * NULL then it points to an array of exclusion operator OIDs.
> + *
> + * XXX if you change this function, see pg_get_indexdef_detailed too.
> */

Isn't that more of a 'Note:' or similar, not an 'XXX'?

... There was a lot here and a lot of it looked pretty familiar, so I
didn't press too hard on it. I'm sure you've been testing it with
Andres, etc, which is all good, but to the extent that issues remain,
they aren't really in our critical paths and the buildfarm along with
other testing by users will flush them out. The structure and approach
certainly seems good to me.

> Subject: [PATCH 06/29] deparse: Support CREATE TYPE AS ENUM / RANGE

Looked fine.

> Subject: [PATCH 07/29] deparse: Support EXTENSION commands

> diff --git a/src/backend/tcop/deparse_utility.c b/src/backend/tcop/deparse_utility.c

> /*
> + * deparse_CreateExtensionStmt
> + * deparse a CreateExtensionStmt
> + *
> + * Given an extension OID and the parsetree that created it, return the JSON
> + * blob representing the creation command.
> + *
> + * XXX the current representation makes the output command dependant on the
> + * installed versions of the extension. Is this a problem?
> + */

I think requiring the same version is perfectly sensible and, yeah, if
that version isn't installed then things go boom, but that shouldn't
really surprise anyone at this point.

Having been looking at pg_dump, extensions, and binary upgrades
recently, it does make me wonder if any thought has been put towards
those functions which extensions (and binary upgrade) can call to
associated objects with extensions..? Those are modifying objects but
they don't go through the normal standard_ProcessUtility() framework.
I'm inclined to think that's a mistake, as we build this kind of
introspection into what happens in utility, but we have those cases now
and we should probably think about how to handle them.

> Subject: [PATCH 08/29] deparse: Support CREATE RULE

Didn't see any issues here.

> Subject: [PATCH 09/29] deparse: Support ALTER TYPE / ADD VALUE (enums)

Ditto.

> Subject: [PATCH 10/29] deparse: Support ALTER .. {RENAME/OWNER/SCHEMA}

> diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
> index 3ddd7ec..d28758c 100644
> --- a/src/backend/commands/alter.c
> +++ b/src/backend/commands/alter.c
> @@ -446,7 +446,6 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
> Relation relation;
> Oid classId;
> Oid nspOid;
> - ObjectAddress address;
>
> address = get_object_address(stmt->objectType,
> stmt->object,

Wow. I'm surprised one of the various compilers we have running around
wasn't bitching about that shadowing. Is there any reason to think we
need to worry about this? We don't have any code that cares, but
perhaps some external users might?

> diff --git a/src/backend/tcop/deparse_utility.c b/src/backend/tcop/deparse_utility.c

The comments in here seemed to, particularly, have a lot of unanswered
questions. Would be good to get those addressed.

> Subject: [PATCH 11/29] deparse: Support CREATE/ALTER DOMAIN
> Subject: [PATCH 12/29] deparse: Support CREATE/ALTER FUNCTION
> Subject: [PATCH 13/29] deparse: Support ALTER TABLE
> Subject: [PATCH 14/29] deparse: Support CREATE VIEW
> Subject: [PATCH 15/29] deparse: Support CREATE OPERATOR FAMILY
> Subject: [PATCH 16/29] deparse: Support CREATE CONVERSION
> Subject: [PATCH 17/29] deparse: Support DefineStmt commands
> Subject: [PATCH 18/29] deparse: support CREATE CAST
> Subject: [PATCH 19/29] deparse: Support GRANT/REVOKE

These looked alright.

> pg_event_trigger_get_creation_commands(PG_FUNCTION_ARGS)
> {
> @@ -1915,6 +1997,30 @@ pg_event_trigger_get_creation_commands(PG_FUNCTION_ARGS)
> /* command */
> values[i++] = CStringGetTextDatum(command);
> }
> + else
> + {
> + Assert(cmd->type == SCT_Grant);
> +
> + /* classid */
> + nulls[i++] = true;
> + /* objid */
> + nulls[i++] = true;
> + /* objsubid */
> + nulls[i++] = true;
> + /* command tag */
> + values[i++] = CStringGetTextDatum(cmd->d.grant.istmt->is_grant ?
> + "GRANT" : "REVOKE");
> + /* object_type */
> + values[i++] = CStringGetTextDatum(cmd->d.grant.type);
> + /* schema */
> + nulls[i++] = true;
> + /* identity */
> + nulls[i++] = true;
> + /* in_extension */
> + values[i++] = BoolGetDatum(cmd->in_extension);
> + /* command */
> + values[i++] = CStringGetTextDatum(command);
> + }

Really seems like we need to do better here, no? That's a lot of things
just getting returned as NULLs...

> Subject: [PATCH 20/29] deparse: support COMMENT ON and SECURITY LABEL
> Subject: [PATCH 21/29] deparse: Handle default security provider.

> diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
> index 1ef98ce..aa4de97 100644
> --- a/src/backend/commands/seclabel.c
> +++ b/src/backend/commands/seclabel.c
> @@ -63,6 +63,11 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("must specify provider when multiple security label providers have been loaded")));
> provider = (LabelProvider *) linitial(label_provider_list);
> + /*
> + * Set the provider in the statement so that DDL deparse can use
> + * provider explicitly in generated statement.
> + */
> + stmt->provider = (char *) provider->provider_name;
> }
> else
> {

This should really be in the SECURITY LABEL patch, no? Not a big deal
tho.

> Subject: [PATCH 22/29] deparse: support CREATE TABLE AS
> Subject: [PATCH 23/29] deparse: support CREATE/ALTER POLICY
> Subject: [PATCH 24/29] deparse: support REFRESH MATERIALIZED VIEW
> Subject: [PATCH 25/29] deparse: support FDW-related commands
> Subject: [PATCH 26/29] infrastructure for ALTER OPERATOR FAMILY
> Subject: [PATCH 27/29] deparse: Support ALTER OPERATOR FAMILY
> Subject: [PATCH 28/29] deparse: support ALTER DEFAULT PRIVILEGES
> Subject: [PATCH 29/29] deparse: support CREATE OPERATOR CLASS

Looked alright.

Again though, as I got at above, I'm not sure we're really going to see
a lot of improvements or changes to this until it gets more exposure.
I'd really like to see this happen because I think it's a fantastic
introspective capability, but I do wish it had gone in about 4 months
ago. Still, I'm on board with moving forward on this and look forward
to it being in the tree where we can get the buildfarm working on it and
other people testing and playing with it.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-19 04:43:52 Re: Table-level log_autovacuum_min_duration
Previous Message Amit Kapila 2015-03-19 03:43:38 Re: Parallel Seq Scan