Re: Improving inferred query column names

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Vladimir Churyukin <vladimir(at)churyukin(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving inferred query column names
Date: 2023-02-11 21:51:21
Message-ID: CADkLM=fJFR0tWuo+GprqXw6sDnK8YHZ=81OMd7wRnGXAH9pUqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 11, 2023 at 3:47 PM Vladimir Churyukin <vladimir(at)churyukin(dot)com>
wrote:

> For backwards compatibility I guess you can have a GUC flag controlling
> that behavior that can be set into backwards compatibility mode if required.
> The previous functionality can be declared deprecated and removed (with
> the flag) once the current version becomes unsupported.
>

Seems more like a per-session setting than a GUC.

Here's a suggestion off the top of my head.

We create a session setting inferred_column_name_template.

The template takes a formatting directive %N which is just a counter

SET inferred_column_name_template = 'col_%N'

which would give you col_1, col_2, regardless of what kind of expression
the columns were

We could introduce another directive, %T

SET inferred_column_name_template = '%T_%N'

which prints the datatype short name of the column. In this case, %N would
increment per datatype, so text_1, integer_1, text_2, timestamptz_1, text_3

Getting fancier, we could introduce something less datatype centric, %F

SET inferred_column_name_template = '%F_%N'

Which would walk the following waterfall and stop on the first match

1. The datatype short name if the expression is explicitly casted
(either CAST or ::)
2. the name of the function if the outermost expression was a function
(aggregate, window, or scalar), so sum_1, substr_1
3. 'case' if the outermost expression was case
4. 'expr' if the expression was effectively an operator ( SELECT 3+4,
'a' || 'b' etc)
5. the datatype short name for anything that doesn't match any of the
previous, and for explicit casts

Keeping track of all the %N counters could get silly, so maybe a %P which
is simply the numeric column position of the column, so your result set
would go like: id, name, col_3, last_login, col_5.

We would have to account for the case where the user left either %N or %P
out of the template, so one of them would be an implied suffix if both were
absent, or we maybe go with

SET inferred_column_name_prefix = '%F_';
SET inferred_column_name_counter = 'position'; /* position, counter,
per_type_counter */

Or we just cook up a few predefined naming schemes, and let the user pick
from those.

One caution I have is that I have seen several enterprise app database
designs that have lots of user-customizable columns with names like
varchar1, numeric4, etc. Presumably the user would know their environment
and not pick a confusing template.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2023-02-11 21:54:56 Re: Transparent column encryption
Previous Message Andres Freund 2023-02-11 21:36:51 Re: refactoring relation extension and BufferAlloc(), faster COPY