Re: Schema variables - new implementation for Postgres 15

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Joel Jacobson <joel(at)compiler(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Schema variables - new implementation for Postgres 15
Date: 2022-03-03 07:06:52
Message-ID: 20220303070652.xjmyd6xkeqfazvjh@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Mar 02, 2022 at 06:03:06AM +0100, Pavel Stehule wrote:
>
> I lost commit with this change. I am sending updated patch.

Thanks a lot Pavel!

I did a more thorough review of the patch. I'm attaching a diff (in .txt
extension) for comment improvement suggestions. I may have misunderstood
things so feel free to discard some of it. I will mention the comment I didn't
understand in this mail.

First, I spotted some problem in the invalidation logic.

+ * Assign sinval mark to session variable. This mark probably
+ * signalized, so the session variable was dropped. But this
+ * should be rechecked later against system catalog.
+ */
+static void
+pg_variable_cache_callback(Datum arg, int cacheid, uint32 hashvalue)

You mention that hashvalue can only be zero for commands that can't
affect session variables (like VACUUM or ANALYZE), but that's not true. It can
also happen in case of sinval queue overflow (see InvalidateSystemCaches()).
So in that case we should trigger a full recheck, with some heuristics on how
to detect that a cached variable is still valid. Unfortunately the oid can
wraparound so some other check is needed to make it safe.

Also, even if we get a non-zero hashvalue in the inval callback, we can't
assume that there weren't any collision in the hash. So the additional check
should be used there too.

We had a long off-line discussion about this with Pavel yesterday on what
heuristic to use there. Unlike other caches where discarding an entry when it
shouldn't have been is not really problematic, the cache here contains the real
variable value so we can't discard it unless the variable was really dropped.
It should be possible to make it work, so I will let Pavel comment on which
approach he wants to use and what the drawbacks are. I guess that this will be
the most critical part of this patch to decide whether the approach is
acceptable or not.

The rest is only minor stylistic comments.

Using -DRAW_EXPRESSION_COVERAGE_TEST I see that T_LetStmt is missing in
raw_expression_tree_walker.

ALTER and DROP both suggest "IMMUTABLE VARIABLE" as valid completion, while
it should only be usable in the CREATE [ IMMUTABLE ] VARIABLE form.

+initVariable(Variable *var, Oid varid, bool fast_only)
+{
+ var->collation = varform->varcollation;
+ var->eoxaction = varform->vareoxaction;
+ var->is_not_null = varform->varisnotnull;
+ var->is_immutable = varform->varisimmutable;

nit: eoxaction is defined after is_not_null and is_immutable, it would be
better to keep the initialization order consistent (same in VariableCreate).

+ values[Anum_pg_variable_varcollation - 1] = ObjectIdGetDatum((char) varCollation);
+ values[Anum_pg_variable_vareoxaction - 1] = CharGetDatum(eoxaction);

seems like the char cast is on the wrong variable?

+ * [...] We have to hold two separate action lists:
+ * one for dropping the session variable from system catalog, and
+ * another one for resetting its value. Both are necessary, since
+ * dropping a session variable also needs to enforce a reset of
+ * the value.

I don't fully understand that comment. Maybe you meant that the opposite isn't
true, ie. highlight that a reset should *not* drop the variable thus two lists?

+typedef enum SVariableXActAction
+{
+ SVAR_ON_COMMIT_DROP, /* used for ON COMMIT DROP */
+ SVAR_ON_COMMIT_RESET, /* used for DROP VARIABLE */
+ SVAR_RESET, /* used for ON TRANSACTION END RESET */
+ SVAR_RECHECK /* verify if session variable still exists */
+} SVariableXActAction;
+
+typedef struct SVariableXActActionItem
+{
+ Oid varid; /* varid of session variable */
+ SVariableXActAction action; /* reset or drop */

the stored action isn't simply "reset or drop", even though the resulting
action will be a reset or a drop (or a no-op) right? Since it's storing a enum
define just before, I'd just drop the comment on action, and maybe specify that
SVAR_RECHECK will do appropriate cleanup if the session variable doesn't exist.

+ * Release the variable defined by varid from sessionvars
+ * hashtab.
+ */
+static void
+free_session_variable(SVariable svar)

The function name is a bit confusing given the previous function. Maybe this
one should be called forget_session_variable() instead, or something like that?

I think the function comment should also mention that caller is responsible for
making sure that the sessionvars htab exists before calling it, for extra
clarity, or just add an assert for that.

+static void
+free_session_variable_varid(Oid varid)

Similary, maybe renaming this function forget_session_variable_by_id()?

+static void
+create_sessionvars_hashtable(void)
+{
+ HASHCTL ctl;
+
+ /* set callbacks */
+ if (first_time)
+ {
+ /* Read sinval messages */
+ CacheRegisterSyscacheCallback(VARIABLEOID,
+ pg_variable_cache_callback,
+ (Datum) 0);
+
+ first_time = false;
+ }
+
+ /* needs its own long lived memory context */
+ if (SVariableMemoryContext == NULL)
+ {
+ SVariableMemoryContext =
+ AllocSetContextCreate(TopMemoryContext,
+ "session variables",
+ ALLOCSET_START_SMALL_SIZES);
+ }

As far as I can see the SVariableMemoryContext can be reset but never set to
NULL, so I think the initialization can be done in the first_time case, and
otherwise asserted that it's not NULL.

+ if (!isnull && svar->typid != typid)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("type \"%s\" of assigned value is different than type \"%s\" of session variable \"%s.%

Why testing isnull? I don't think it's ok to allow NULL::text in an int
variable for instance. This isn't valid in other context (like inserting in a
table)

+ * result of default expression always). Don't do this check, when variable
+ * is initialized.
+ */
+ if (!init_mode &&

I think the last part of the comment is a bit misleading. Maybe "when variable
is being initialized" (and similary same for the function comment).

+ * We try not to break the previous value, if something is wrong.
+ *
+ * As side efect this function acquires AccessShareLock on
+ * related session variable until commit.
+ */
+void
+SetSessionVariable(Oid varid, Datum value, bool isNull, Oid typid)

I don't understand what you mean by "We try not to break the previous value, if
something is wrong".

+ /* Initialize svar when not initialized or when stored value is null */
+ if (!found)
+ {
+ Variable var;
+
+ /* don't need defexpr and acl here */
+ initVariable(&var, varid, true);
+ init_session_variable(svar, &var);
+ }
+
+ set_session_variable(svar, value, isNull, typid, false);

Shouldn't the comment be on the set_session_variable() vall rather than on the
!found block?

+ * Returns the value of the session variable specified by varid. Check correct
+ * result type. Optionally the result can be copied.
+ */
+Datum
+GetSessionVariable(Oid varid, bool *isNull, Oid expected_typid, bool copy)

All callers use copy == true, couldn't we get rid of it and say it returns a
copy of the value if any?

+ * Create new ON_COMMIT_DROP xact action. We have to drop
+ * ON COMMIT DROP variable, although this variable should not
+ * be used. So we need to register this action in CREATE VARIABLE
+ * time.

I don't understand this comment.

+AtPreEOXact_SessionVariable_on_xact_actions(bool isCommit)
+{
+ ListCell *l;
+
+ foreach(l, xact_drop_actions)
+ {
+ SVariableXActActionItem *xact_ai =
+ (SVariableXActActionItem *) lfirst(l);
+
+ /* Iterate only over non dropped entries */
+ if (xact_ai->deleting_subid == InvalidSubTransactionId)
+ {
+ Assert(xact_ai->action == SVAR_ON_COMMIT_DROP);

The assert sould probably be in the block above.

+ * We want to reset session variable (release it from
+ * local memory) when RESET is required or when session
+ * variable was removed explicitly (DROP VARIABLE) or
+ * implicitly (ON COMMIT DROP). Explicit releasing should
+ * be done only if the transaction is commited.
+ */
+ if ((xact_ai->action == SVAR_RESET) ||
+ (xact_ai->action == SVAR_ON_COMMIT_RESET &&
+ xact_ai->deleting_subid == InvalidSubTransactionId &&
+ isCommit))
+ free_session_variable_varid(xact_ai->varid);

This chunk is a bit hard to follow. Also, for SVAR_RESET wouldn't it be better
to only make the svar invalid and keep it in the htab? If so, this could be
split in two different branches which would be easier to follow.

+ if (!isCommit &&
+ xact_ai->creating_subid == mySubid &&
+ xact_ai->action != SVAR_RESET &&
+ xact_ai->action != SVAR_RECHECK)
+ {
+ /* cur_item must be removed */
+ xact_reset_actions = foreach_delete_current(xact_reset_actions, cur_item);
+ pfree(xact_ai);

I think that be definition only the SVAR_ON_COMMIT_DROP (cleaning entry for a
dropped session variable) will ever need to be removed there, so we should
check for that instead of not being something else?

+ /*
+ * Prepare session variables, if not prepared in queryDesc
+ */
+ if (queryDesc->num_session_variables > 0)

I don't understand that comment.

+static void
+svariableStartupReceiver(DestReceiver *self, int operation, TupleDesc typeinfo)
+{
+ svariableState *myState = (svariableState *) self;
+ int natts = typeinfo->natts;
+ int outcols = 0;
+ int i;
+
+ for (i = 0; i < natts; i++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(typeinfo, i);
+
+ if (attr->attisdropped)
+ continue;
+
+ if (++outcols > 1)
+ elog(ERROR, "svariable DestReceiver can take only one attribute");
+
+ myState->typid = attr->atttypid;
+ myState->typmod = attr->atttypmod;
+ myState->typlen = attr->attlen;
+ myState->slot_offset = i;
+ }
+
+ myState->rows = 0;
+}

Maybe add an initial Assert to make sure that caller did call
SetVariableDestReceiverParams(), and final check that one attribute was found?

@@ -1794,15 +1840,39 @@ fix_expr_common(PlannerInfo *root, Node *node)
g->cols = cols;
}
}
+ else if (IsA(node, Param))
+ {
+ Param *p = (Param *) node;
+
+ if (p->paramkind == PARAM_VARIABLE)
+ {
+ PlanInvalItem *inval_item = makeNode(PlanInvalItem);
+
+ /* paramid is still session variable id */
+ inval_item->cacheId = VARIABLEOID;
+ inval_item->hashValue = GetSysCacheHashValue1(VARIABLEOID,
+ ObjectIdGetDatum(p->paramvarid));
+
+ /* Append this variable to global, register dependency */
+ root->glob->invalItems = lappend(root->glob->invalItems,
+ inval_item);
+ }
+ }

I didn't see any test covering invalidation of cached plan using session
variables. Could you add some? While at it, maybe use different values on the
sesssion_variable.sql tests rather than 100 in many places, so it's easier to
identifier which case broke in case of problem.

+static Node *
+makeParamSessionVariable(ParseState *pstate,
+ Oid varid, Oid typid, int32 typmod, Oid collid,
+ char *attrname, int location)
+{
[...]
+ /*
+ * There are two ways to access session variables - direct, used by simple
+ * plpgsql expressions, where it is not necessary to emulate stability.
+ * And Buffered access, which is used everywhere else. We should ensure
+ * stable values, and because session variables are global, then we should
+ * work with copied values instead of directly accessing variables. For
+ * direct access, the varid is best. For buffered access, we need
+ * to assign an index to the buffer - later, when we know what variables are
+ * used. Now, we just remember, so we use session variables.

I don't understand the last part, starting with "For buffered access, we
need...". Also, the beginning of the comment seems like something more general
and may be moved somewhere, maybe at the beginning of sessionvariable.c?

+ * stmt->query is SelectStmt node. An tranformation of
+ * this node doesn't support SetToDefault node. Instead injecting
+ * of transformSelectStmt or parse state, we can directly
+ * transform target list here if holds SetToDefault node.
+ */
+ if (stmt->set_default)

I don't understand this comment. Especially since the next
transformTargetList() will emit SetToDefault node that will be handled later in
that function and then in RewriteQuery.

+ /*
+ * rewrite SetToDefaults needs varid in Query structure
+ */
+ query->resultVariable = varid;

I also don't understand that comment. Is is always set just in case there's a
SetToDefault, or something else?

+ /* translate paramvarid to session variable name */
+ if (param->paramkind == PARAM_VARIABLE)
+ {
+ appendStringInfo(context->buf, "%s",
+ generate_session_variable_name(param->paramvarid));
+ return;
+ }

A bit more work seems to be needed for deparsing session variables:

# create variable myvar text;
CREATE VARIABLE

# create view myview as select myvar;
CREATE VIEW

# \d+ myview
View "public.myview"
Column | Type | Collation | Nullable | Default | Storage | Description
--------+------+-----------+----------+---------+----------+-------------
myvar | text | | | | extended |
View definition:
SELECT myvar AS myvar;

There shouldn't be an explicit alias I think.

Attachment Content-Type Size
0001-Suggestion-for-comment-improvements.patch text/plain 25.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-03-03 07:16:26 Re: Schema variables - new implementation for Postgres 15
Previous Message Daniel Westermann (DWE) 2022-03-03 06:55:43 Re: Changing "Hot Standby" to "hot standby"