Re: \describe*

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Ryan Murphy <ryanfmurphy(at)gmail(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: \describe*
Date: 2019-08-01 18:57:19
Message-ID: CADkLM=fxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> It seems this topic is ongoing so I've moved it to the September CF,
> but it's in "Waiting on Author" because we don't have a concrete patch
> that applies (or agreement on what it should do?) right now.
>

All recent work has been investigating the need(s) we're trying to address.
This is as good of a time as any to share my findings (with much
collaboration with Dave Fetter) so far.

1. Adding helper commands to psql aids only psql, and a great number of
users do not, or can not, use psql. So adding something on the server side
would have broader usage and appeal. Furthermore, some access tools
(especially browser-based ones) are not good about returning non-tabular
results, so helper commands that return result sets would have the broadest
usage.

2. Our own interest in server-side commands is all over the map. Some just
want the convenience of having them server side, or familiarity with
$OTHER_DB. Others want to eliminate the need for some code in pg_dump,
JDBC, or elsewhere.

3. There isn't much consensus in the other databases, though all of them do
*something*:

SQLServer
---------------

SQLServer has sp_help (
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-help-transact-sql?view=sql-server-2017
)

which contextually returns one of two different result sets (name, owner,
object type) or (column name, type, storage, length, precision, scale,
nullable, default, rule, collation)

DB2
------
Has a describe command (source:
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002019.html)
which
can be used to describe query output (data type, data type length, column
name, column name length).

It also has an option to DESCRIBE TABLE foo which returns a set of
(col_name, schema_of_datatype, data_type, data_type_length,
data_type_scale, Nulls t/f)

It also has DESCRIBE INDEXES FOR TABLE foo which returns a set of (schema
of index, name of index, unique flag, number of columns, index type)

It also has DESCRIBE DATA PARTITIONS FOR TABLE which as you might guess
shows partitions.

All of these options have a SHOW DETAIL modifier which adds more columns.

MySQL
----------

(https://dev.mysql.com/doc/refman/8.0/en/show-columns.html)
MySSQL has SHOW COLUMNS which also returns a set of (name, type similar to
format_type(), null flag, PK or index indicator, default value, notes about
auto-increment/autogreneration/implicit trggers), and can be extended to
show privileges and comments with the EXTENDED and FULL options.

MySQL has a DESCRIBE command, but it is a synonym of EXPLAIN.

MySQL also has a raft of commands like SHOW CREATE USER, SHOW CREATE VIEW,
SHOW CREATE TRIGGER, SHOW CREATE TABLE, etc. (ex:
https://dev.mysql.com/doc/refman/8.0/en/show-create-user.html) These
commands all return a result set of of exactly one column, each row
representing one SQL statement, essentially doing a single-object
schema-only pg_dump.

Oracle
---------

https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12019.htm

SQL*Plus has a describe command that works on tables and views and
composite types (tabular set of: name, null, type) procedures (tabular set
of: arg name, type, in/out), and packages (a series of sets one per type
and procedure)

SQLcl has the INFO statement, which is roughly analogous to psql's \d in
that it is a mix of tabular and non-tabular information.

Oracle itself has dbms_metadata.get_ddl() which seems analogous to mysql's
SHOW CREATE commands.

Snowflake
--------------

Snowflake has DESCRIBE TABLE
https://docs.snowflake.net/manuals/sql-reference/sql/desc-table.html and
DESCRIBE VIEW
https://docs.snowflake.net/manuals/sql-reference/functions/get_ddl.html

Which return a set of: (name, type, column type, null flag, default,
primary key, unique key, check, expression, comment).

It also has an option for describing "stage" tables, which are s3 buckets
with a file format associated, the closest postgresql analog would be a
file_fdw foreign table, and there is a separate result set format for that.

Snowflake has no concept of indexes (it understands that there's things
called a unique keys, and it remembers that you said you wanted one, but
does nothing to enforce it), so no command for that.

These result sets are not composable in a query, however, they are stored
in the RESULT_SCAN cache, which means that you can run a describe, and then
immediately fetch the results of that command as if it was a table.

Snowflake also has a get_ddl() function
https://docs.snowflake.net/manuals/sql-reference/sql/desc-view.html which
is a one-column result set of statements to re-create the given object.

From all this, I have so far concluded:

1. There is real demand to be able to easily see the basic structure of
tables, views, and indexes in a way that strikes a balance between detail
and clutter.
2. There is some acknowledgement that this data be useful if it was further
filtered through SQL, though only one vendor has attempted to implement
that (Snowflake) and even that was far from seamless.
3. There's a clear need to be able to get the DDL steps needed to re-create
most common objects. This could be to copy-paste the info into another
session to create a similar object elsewhere, or for test cases, or so an
experienced person can see the "real guts" of an object without worrying
about what details have been hidden.
4. The needs in #1 and #3 are in direct opposition to each other, and
cannot easily be handled by the same command. Indeed, no one has tried.
5. The SHOW CREATE commands are neat, but the plethora of options (include
comments? include RI constraints? does current session locale figure into
the answer? dependencies?) probably mean that a function the get_ddl()
examples above, can be jam-packed with default parameters and it's still
going to have a very RETURNS SETOF TEXT output.
6. The convenience-function DESCRIBE / SHOW COLUMNS commands strive to have
a tabular format with following: column name, a user-readable datatype,
null/notnull flag, indicator that the column participates in a PK, in a
unique index, in any other sort of index, without naming the index, and
options for showing default values, and comments
7. People coming from other databases have expectations of a command like
DESCRIBE existing, and those expectations are reasonable.
8. The commands we do make should strive to have a stable result-set format
to allow their use in situations where the results can easily be handled
via JDBC/DBD/DBI, and maybe somewhere down the road the commands
themselves can be used as a subquery the same way that TABLE foo is an
alias for SELECT * FROM foo.
9. Since the relevant columns for describing a table are different than
those for a view or an index or another object, they should be separate
commands.

I'm interested to hear what other people thing, and which of these goals
are most worth pursuing in the near term to make postgres more usable to
newbies and veterans alike.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-01 18:59:00 Re: Avoid full GIN index scan when possible
Previous Message Heikki Linnakangas 2019-08-01 18:54:56 Re: Batch insert in CTAS/MatView code