Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mikhail Titov <mlt(at)gmx(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
Date: 2020-09-06 21:42:40
Message-ID: 1501863.1599428560@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mikhail Titov <mlt(at)gmx(dot)us> writes:
> Previously submitted patch got somehow trailing spaces mangled on the
> way out. This is an attempt to use application/octet-stream MIME instead
> of text/x-patch to preserve those for regression tests.

I took a quick look through this. I agree with the general idea of
detecting cases where all of the entries in a VALUES column are DEFAULT,
but the implementation needs work.

The cfbot reports that it doesn't compile [1]:

parse_relation.c: In function ‘expandNSItemVars’:
parse_relation.c:2992:34: error: ‘T_Node’ undeclared (first use in this function)
std = list_nth_node(Node, row, colindex);
^

I suspect this indicates that you did not use --enable-cassert in your own
testing, which is usually a bad idea; that enables a lot of checks that
you really want to have active for development purposes.

Hacking expandNSItemVars() for this purpose is an extremely bad idea.
The API spec for that is
* Produce a list of Vars, and optionally a list of column names,
* for the non-dropped columns of the nsitem.
This patch breaks that specification, and in turn breaks callers that
expect it to be adhered to. I see at least one caller that will suffer
assertion failures because of that, which reinforces my suspicion that
you did not test with assertions on.

I think you'd be better off to make transformInsertStmt(), specifically
its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust
the tlist itself. Doing it there might be a good bit less inefficient
for very long VALUES lists, too, which is a case that we do worry about.
Since that's already iterating through the sub-lists, you could track
whether all rows seen so far contain SetToDefault in each column position,
and avoid extra scans of the sublists. (A BitmapSet might be a convenient
representation of that, though you could also use a bool array I suppose.)

I do not care for what you did in rewriteValuesRTE() either: just removing
a sanity check isn't OK, unless you've done something to make the sanity
check unnecessary which you surely have not. Perhaps you could extend
the initial scan of the tlist (lines 1294-1310) to notice SetToDefault
nodes as well as Var nodes and keep track of which columns have those.
Then you could cross-check that one or the other case applies whenever
you see a SetToDefault in the VALUES lists.

BTW, another thing that needs checking is whether a rule containing
an INSERT like this will reverse-list sanely. The whole idea of
replacing some of the Vars might not work so far as ruleutils.c is
concerned. In that case I think we might have to implement this
by having transformInsertStmt restructure the VALUES lists to
physically remove the all-DEFAULT column, and adjust the target column
list accordingly --- that is, make a parse-time transformation of
INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT);
into
INSERT INTO gtest0(a) VALUES (1), (2);
That'd have the advantage that you'd not have to hack up the
rewriter at all.

Also, in the case where the user has failed to ensure that all the
column entries are DEFAULT, I suppose that we'll still get the same
error as now:

regression=# INSERT INTO gtest0 VALUES (1, DEFAULT), (2, 42);
ERROR: cannot insert into column "b"
DETAIL: Column "b" is a generated column.

This seems fairly confusing and unhelpful. Perhaps it's not this
patch's job to improve it, but it'd be nice if we could do better.
One easy change would be to make the error message more specific:

ERROR: cannot insert a non-DEFAULT value into column "b"

(I think this wording is accurate, but I might be wrong.) It'd be
even better if we could emit an error cursor pointing at (one of)
the entries that are not DEFAULT, since in a command with a long
VALUES list it might not be that obvious where you screwed up.

FWIW, I would not bother splitting a patch like this into two parts.
That increases your effort level, and it increases the reviewer's
effort to apply it too, and this patch isn't big enough to justify it.

regards, tom lane

[1] https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/724543451

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-09-06 22:26:37 Re: A micro-optimisation for walkdir()
Previous Message Tomas Vondra 2020-09-06 21:21:12 Re: Disk-based hash aggregate's cost model