Re: Improving inferred query column names

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Vladimir Churyukin <vladimir(at)churyukin(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving inferred query column names
Date: 2023-02-23 04:03:48
Message-ID: 341525.1677125028@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2023-02-22 16:38:51 -0500, Tom Lane wrote:
>> The proposal so far was just to handle a function call wrapped
>> around something else by converting to the function name followed
>> by whatever we'd emit for the something else.

> SELECT sum(relpages), sum(reltuples), 1+1 FROM pg_class;
> ┌──────────────┬───────────────┬──────────┐
> │ sum_relpages │ sum_reltuples │ ?column? │
> ├──────────────┼───────────────┼──────────┤

So far so good, but what about multi-argument functions?
Do we do "f_x_y_z", and truncate wherever? How well will this
work with nested function calls?

>> You cannot realistically
>> handle, say, operator expressions without emitting names that will
>> require quoting, which doesn't seem attractive.

> Well, it doesn't require much to be better than "?column?", which already
> requires quoting...

I think the point of "?column?" is to use something that nobody's going
to want to reference that way, quoted or otherwise. The SQL spec says
(in SQL:2021, it's 7.16 <query specification> syntax rule 18) that if the
column expression is anything more complex than a simple column reference
(or SQL parameter reference, which I think we don't support) then the
column name is implementation-dependent, which is standards-ese for
"here be dragons".

BTW, SQL92 and SQL99 had a further constraint:

c) Otherwise, the <column name> of the i-th column of the <query
specification> is implementation-dependent and different
from the <column name> of any column, other than itself, of
a table referenced by any <table reference> contained in the
SQL-statement.

We never tried to implement that literally, and now I'm glad we didn't
bother, because recent spec versions only say "implementation-dependent",
full stop. In any case, the spec is clearly in the camp of "don't depend
on these column names".

> We could just do something like printing <left>_<funcname>_<right>. So
> something like avg(reltuples / relpages) would end up as
> avg_reltuples_float48div_relpages.
> Whether that's worth it, or whether column name lengths would be too painful,
> IDK.

I think you'd soon be hitting NAMEDATALEN limits ...

>> And no, deduplication isn't on the table at all here.

> +1

I remembered while looking at the spec that duplicate column names
in SELECT output are not only allowed but *required* by the spec.
If you write, say, "SELECT 1 AS x, 2 AS x, ..." then the column
names of those two columns are both "x", no wiggle room at all.
So I see little point in trying to deduplicate generated names,
even aside from the points you made.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-02-23 04:12:56 Re: pgindent vs. git whitespace check
Previous Message Peter Smith 2023-02-23 02:07:38 Re: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes