Re: Schema variables - new implementation for Postgres 15

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Joel Jacobson <joel(at)compiler(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Schema variables - new implementation for Postgres 15
Date: 2022-06-21 08:46:06
Message-ID: CAFj8pRAuxFx36TOs+k-9_rYR+gyBhvsdrGG1odahj8F1iM6R7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 3. 3. 2022 v 8:06 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com> napsal:

> Hi,
>
> On Wed, Mar 02, 2022 at 06:03:06AM +0100, Pavel Stehule wrote:
> >
> > I lost commit with this change. I am sending updated patch.
>
> Thanks a lot Pavel!
>
> I did a more thorough review of the patch. I'm attaching a diff (in .txt
> extension) for comment improvement suggestions. I may have misunderstood
> things so feel free to discard some of it. I will mention the comment I
> didn't
> understand in this mail.
>
> First, I spotted some problem in the invalidation logic.
>
> + * Assign sinval mark to session variable. This mark probably
> + * signalized, so the session variable was dropped. But this
> + * should be rechecked later against system catalog.
> + */
> +static void
> +pg_variable_cache_callback(Datum arg, int cacheid, uint32 hashvalue)
>
> You mention that hashvalue can only be zero for commands that can't
> affect session variables (like VACUUM or ANALYZE), but that's not true.
> It can
> also happen in case of sinval queue overflow (see
> InvalidateSystemCaches()).
> So in that case we should trigger a full recheck, with some heuristics on
> how
> to detect that a cached variable is still valid. Unfortunately the oid can
> wraparound so some other check is needed to make it safe.
>
> Also, even if we get a non-zero hashvalue in the inval callback, we can't
> assume that there weren't any collision in the hash. So the additional
> check
> should be used there too.
>
> We had a long off-line discussion about this with Pavel yesterday on what
> heuristic to use there. Unlike other caches where discarding an entry
> when it
> shouldn't have been is not really problematic, the cache here contains the
> real
> variable value so we can't discard it unless the variable was really
> dropped.
> It should be possible to make it work, so I will let Pavel comment on which
> approach he wants to use and what the drawbacks are. I guess that this
> will be
> the most critical part of this patch to decide whether the approach is
> acceptable or not.
>

I thought more about this issue, and I think it is solvable, although
differently (little bit than we talked about). The check based on oid and
xmin should not be enough for consistency check, because xmin can be
quickly lost when a user executes VACUUM FREEZE or VACUUM FULL.

The consistency of a stored session variable should be checked always when
the session variable is used (for reading) the first time in a
transaction. When value is created and used in the same transaction, then
the consistency check is not necessary. When consistency check fails, then
stored value is marked as broken and cannot be read. Can be overwritten.

We can believe that session variables based on buildin types are always
consistent.

Composite types should be checked recursively from top to buildin types. It
means we should hold tupledescs for all nested composites. Initially the
check can be very strict.

Last case is consistency check for types owned by some extensions. For this
case we can accept the version number of related extensions. Without change
we can believe so the stored binary data are consistent.

>
> The rest is only minor stylistic comments.
>
> Using -DRAW_EXPRESSION_COVERAGE_TEST I see that T_LetStmt is missing in
> raw_expression_tree_walker.
>

fixed

>
> ALTER and DROP both suggest "IMMUTABLE VARIABLE" as valid completion, while
> it should only be usable in the CREATE [ IMMUTABLE ] VARIABLE form.
>

fixed

>
> +initVariable(Variable *var, Oid varid, bool fast_only)
> +{
> + var->collation = varform->varcollation;
> + var->eoxaction = varform->vareoxaction;
> + var->is_not_null = varform->varisnotnull;
> + var->is_immutable = varform->varisimmutable;
>
> nit: eoxaction is defined after is_not_null and is_immutable, it would be
> better to keep the initialization order consistent (same in
> VariableCreate).
>

fixed

>
> + values[Anum_pg_variable_varcollation - 1] = ObjectIdGetDatum((char)
> varCollation);
> + values[Anum_pg_variable_vareoxaction - 1] = CharGetDatum(eoxaction);
>
> seems like the char cast is on the wrong variable?
>

fixed

>
> + * [...] We have to hold two separate action lists:
> + * one for dropping the session variable from system catalog, and
> + * another one for resetting its value. Both are necessary, since
> + * dropping a session variable also needs to enforce a reset of
> + * the value.
>
> I don't fully understand that comment. Maybe you meant that the opposite
> isn't
> true, ie. highlight that a reset should *not* drop the variable thus two
> lists?
>

I tried to describe the issue in the comment. When I have just one action
list, then I had a problem with impossibility to extend this list about
reset action enforced by drop variable when I iterated over this list in
xact time. This issue was solved by using two lists - one for drop and
second for reset and recheck.

>
> +typedef enum SVariableXActAction
> +{
> + SVAR_ON_COMMIT_DROP, /* used for ON COMMIT DROP */
> + SVAR_ON_COMMIT_RESET, /* used for DROP VARIABLE */
> + SVAR_RESET, /* used for ON TRANSACTION END RESET */
> + SVAR_RECHECK /* verify if session variable still exists
> */
> +} SVariableXActAction;
> +
> +typedef struct SVariableXActActionItem
> +{
> + Oid varid; /* varid of session variable */
> + SVariableXActAction action; /* reset or drop */
>
> the stored action isn't simply "reset or drop", even though the resulting
> action will be a reset or a drop (or a no-op) right? Since it's storing a
> enum
> define just before, I'd just drop the comment on action, and maybe specify
> that
> SVAR_RECHECK will do appropriate cleanup if the session variable doesn't
> exist.
>
> done

>
> + * Release the variable defined by varid from sessionvars
> + * hashtab.
> + */
> +static void
> +free_session_variable(SVariable svar)
>
> The function name is a bit confusing given the previous function. Maybe
> this
> one should be called forget_session_variable() instead, or something like
> that?
>
> I think the function comment should also mention that caller is
> responsible for
> making sure that the sessionvars htab exists before calling it, for extra
> clarity, or just add an assert for that.
>
> +static void
> +free_session_variable_varid(Oid varid)
>
> Similary, maybe renaming this function forget_session_variable_by_id()?
>

I don't like "forget" too much - maybe "remove" can be used instead - like
HASH_REMOVE

> +static void
> +create_sessionvars_hashtable(void)
> +{
> + HASHCTL ctl;
> +
> + /* set callbacks */
> + if (first_time)
> + {
> + /* Read sinval messages */
> + CacheRegisterSyscacheCallback(VARIABLEOID,
> + pg_variable_cache_callback,
> + (Datum) 0);
> +
> + first_time = false;
> + }
> +
> + /* needs its own long lived memory context */
> + if (SVariableMemoryContext == NULL)
> + {
> + SVariableMemoryContext =
> + AllocSetContextCreate(TopMemoryContext,
> + "session variables",
> + ALLOCSET_START_SMALL_SIZES);
> + }
>
> As far as I can see the SVariableMemoryContext can be reset but never set
> to
> NULL, so I think the initialization can be done in the first_time case, and
> otherwise asserted that it's not NULL.
>

done

>
> + if (!isnull && svar->typid != typid)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATATYPE_MISMATCH),
> + errmsg("type \"%s\" of assigned value is different than
> type \"%s\" of session variable \"%s.%
>
> Why testing isnull? I don't think it's ok to allow NULL::text in an int
> variable for instance. This isn't valid in other context (like inserting
> in a
> table)
>

changed

>
> + * result of default expression always). Don't do this check, when
> variable
> + * is initialized.
> + */
> + if (!init_mode &&
>
> I think the last part of the comment is a bit misleading. Maybe "when
> variable
> is being initialized" (and similary same for the function comment).
>

changed

>
> + * We try not to break the previous value, if something is wrong.
> + *
> + * As side efect this function acquires AccessShareLock on
> + * related session variable until commit.
> + */
> +void
> +SetSessionVariable(Oid varid, Datum value, bool isNull, Oid typid)
>
> I don't understand what you mean by "We try not to break the previous
> value, if
> something is wrong".
>

That means, so SetSessionVariable sets a new value or should preserve the
original value.

>
> + /* Initialize svar when not initialized or when stored value is null */
> + if (!found)
> + {
> + Variable var;
> +
> + /* don't need defexpr and acl here */
> + initVariable(&var, varid, true);
> + init_session_variable(svar, &var);
> + }
> +
> + set_session_variable(svar, value, isNull, typid, false);
>
> Shouldn't the comment be on the set_session_variable() vall rather than on
> the
> !found block?
>

This comment is obsolete,

removed

>
> + * Returns the value of the session variable specified by varid. Check
> correct
> + * result type. Optionally the result can be copied.
> + */
> +Datum
> +GetSessionVariable(Oid varid, bool *isNull, Oid expected_typid, bool copy)
>
> All callers use copy == true, couldn't we get rid of it and say it returns
> a
> copy of the value if any?
>

I replaced it with the new function CopySessionVariableWithTypeCheck.
Probably in almost all situations, the copy will be required. And if not,
we can enhance the API later.

> + * Create new ON_COMMIT_DROP xact action. We have to drop
> + * ON COMMIT DROP variable, although this variable should not
> + * be used. So we need to register this action in CREATE VARIABLE
> + * time.
>
> I don't understand this comment.
>

changed

>
> +AtPreEOXact_SessionVariable_on_xact_actions(bool isCommit)
> +{
> + ListCell *l;
> +
> + foreach(l, xact_drop_actions)
> + {
> + SVariableXActActionItem *xact_ai =
> + (SVariableXActActionItem *) lfirst(l);
> +
> + /* Iterate only over non dropped entries */
> + if (xact_ai->deleting_subid == InvalidSubTransactionId)
> + {
> + Assert(xact_ai->action == SVAR_ON_COMMIT_DROP);
>
> The assert sould probably be in the block above.
>

moved

>
> + * We want to reset session variable (release it from
> + * local memory) when RESET is required or when session
> + * variable was removed explicitly (DROP VARIABLE) or
> + * implicitly (ON COMMIT DROP). Explicit releasing should
> + * be done only if the transaction is commited.
> + */
> + if ((xact_ai->action == SVAR_RESET) ||
> + (xact_ai->action == SVAR_ON_COMMIT_RESET &&
> + xact_ai->deleting_subid == InvalidSubTransactionId &&
> + isCommit))
> + free_session_variable_varid(xact_ai->varid);
>
> This chunk is a bit hard to follow. Also, for SVAR_RESET wouldn't it be
> better
> to only make the svar invalid and keep it in the htab? If so, this could
> be
> split in two different branches which would be easier to follow.
>

After some experiments, I think it is more simple to remove the svar entry
in htab. It reduces the state space, and variable initialization once per
transaction is not expensive. The problem is in necessary xact action
registration and now I can call it simply just from init_session_variable.
I updated comments there

>
> + if (!isCommit &&
> + xact_ai->creating_subid == mySubid &&
> + xact_ai->action != SVAR_RESET &&
> + xact_ai->action != SVAR_RECHECK)
> + {
> + /* cur_item must be removed */
> + xact_reset_actions =
> foreach_delete_current(xact_reset_actions, cur_item);
> + pfree(xact_ai);
>
> I think that be definition only the SVAR_ON_COMMIT_DROP (cleaning entry
> for a
> dropped session variable) will ever need to be removed there, so we should
> check for that instead of not being something else?
>
>
fixed

>
> + /*
> + * Prepare session variables, if not prepared in queryDesc
> + */
> + if (queryDesc->num_session_variables > 0)
>

I don't understand that comment.
>

I changed this comment

>
> +static void
> +svariableStartupReceiver(DestReceiver *self, int operation, TupleDesc
> typeinfo)
> +{
> + svariableState *myState = (svariableState *) self;
> + int natts = typeinfo->natts;
> + int outcols = 0;
> + int i;
> +
> + for (i = 0; i < natts; i++)
> + {
> + Form_pg_attribute attr = TupleDescAttr(typeinfo, i);
> +
> + if (attr->attisdropped)
> + continue;
> +
> + if (++outcols > 1)
> + elog(ERROR, "svariable DestReceiver can take only one
> attribute");
> +
> + myState->typid = attr->atttypid;
> + myState->typmod = attr->atttypmod;
> + myState->typlen = attr->attlen;
> + myState->slot_offset = i;
> + }
> +
> + myState->rows = 0;
> +}
>
> Maybe add an initial Assert to make sure that caller did call
> SetVariableDestReceiverParams(), and final check that one attribute was
> found?
>

done

>
> @@ -1794,15 +1840,39 @@ fix_expr_common(PlannerInfo *root, Node *node)
> g->cols = cols;
> }
> }
> + else if (IsA(node, Param))
> + {
> + Param *p = (Param *) node;
> +
> + if (p->paramkind == PARAM_VARIABLE)
> + {
> + PlanInvalItem *inval_item = makeNode(PlanInvalItem);
> +
> + /* paramid is still session variable id */
> + inval_item->cacheId = VARIABLEOID;
> + inval_item->hashValue = GetSysCacheHashValue1(VARIABLEOID,
> +
> ObjectIdGetDatum(p->paramvarid));
> +
> + /* Append this variable to global, register dependency */
> + root->glob->invalItems = lappend(root->glob->invalItems,
> + inval_item);
> + }
> + }
>
> I didn't see any test covering invalidation of cached plan using session
> variables. Could you add some? While at it, maybe use different values
> on the
> sesssion_variable.sql tests rather than 100 in many places, so it's easier
> to
> identifier which case broke in case of problem.
>

I created new tests there

>
> +static Node *
> +makeParamSessionVariable(ParseState *pstate,
> + Oid varid, Oid typid, int32 typmod, Oid collid,
> + char *attrname, int location)
> +{
> [...]
> + /*
> + * There are two ways to access session variables - direct, used by
> simple
> + * plpgsql expressions, where it is not necessary to emulate stability.
> + * And Buffered access, which is used everywhere else. We should ensure
> + * stable values, and because session variables are global, then we
> should
> + * work with copied values instead of directly accessing variables. For
> + * direct access, the varid is best. For buffered access, we need
> + * to assign an index to the buffer - later, when we know what
> variables are
> + * used. Now, we just remember, so we use session variables.
>
> I don't understand the last part, starting with "For buffered access, we
> need...". Also, the beginning of the comment seems like something more
> general
> and may be moved somewhere, maybe at the beginning of sessionvariable.c?
>

moved to sessionvariable.c and modified.

> + * stmt->query is SelectStmt node. An tranformation of
> + * this node doesn't support SetToDefault node. Instead injecting
> + * of transformSelectStmt or parse state, we can directly
> + * transform target list here if holds SetToDefault node.
> + */
> + if (stmt->set_default)
>
> I don't understand this comment. Especially since the next
> transformTargetList() will emit SetToDefault node that will be handled
> later in
> that function and then in RewriteQuery.
>

This is messy, sorry. SelectStmt doesn't support SetToDefault. LetStmt
supports it. I reworded.

> + /*
> + * rewrite SetToDefaults needs varid in Query structure
> + */
> + query->resultVariable = varid;
>
> I also don't understand that comment. Is is always set just in case
> there's a
> SetToDefault, or something else?
>

This comment is not complete. This value is required by QueryRewriter (for
replacement of the SetToDefault node by defexpr). It is required for
acquiring locks, and for execution.

I rewrote this comment

>
> + /* translate paramvarid to session variable name */
> + if (param->paramkind == PARAM_VARIABLE)
> + {
> + appendStringInfo(context->buf, "%s",
> +
> generate_session_variable_name(param->paramvarid));
> + return;
> + }
>
> A bit more work seems to be needed for deparsing session variables:
>

> # create variable myvar text;
> CREATE VARIABLE
>
> # create view myview as select myvar;
> CREATE VIEW
>
> # \d+ myview
> View "public.myview"
> Column | Type | Collation | Nullable | Default | Storage | Description
> --------+------+-----------+----------+---------+----------+-------------
> myvar | text | | | | extended |
> View definition:
> SELECT myvar AS myvar;
>
> There shouldn't be an explicit alias I think.
>

this issue was described in other thread

I am sending rebased, updated patches. The type check is not implemented
yet.

Regards

Pavel

Attachment Content-Type Size
v20220621-0011-documentation.patch text/x-patch 43.0 KB
v20220621-0010-This-patch-changes-error-message-column-doesn-t-exis.patch text/x-patch 29.1 KB
v20220621-0009-Regress-tests-for-session-variables.patch text/x-patch 35.8 KB
v20220621-0008-typedefs.patch text/x-patch 1.6 KB
v20220621-0007-Possibility-to-dump-session-variables-by-pg_dump.patch text/x-patch 19.4 KB
v20220621-0006-Enhancing-psql-for-session-variables.patch text/x-patch 15.2 KB
v20220621-0004-Support-of-LET-command-in-PLpgSQL.patch text/x-patch 11.5 KB
v20220621-0005-DISCARD-VARIABLES-command.patch text/x-patch 3.2 KB
v20220621-0003-LET-command.patch text/x-patch 41.9 KB
v20220621-0002-session-variables.patch text/x-patch 86.0 KB
v20220621-0001-Catalogue-support-for-session-variables.patch text/x-patch 97.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-06-21 08:46:54 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Previous Message houzj.fnst@fujitsu.com 2022-06-21 08:07:55 RE: Replica Identity check of partition table on subscriber