Doing better at HINTing an appropriate column within errorMissingColumn()

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-03-27 19:10:16
Message-ID: CAM3SWZS9-Xr2Ud_j9yrKDctT6xxy16h1EugtSWmLU6Or4CtGAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.

Consider the following sample sessions of mine, made with the
dellstore2 sample database:

[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)

[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989

We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.

I have some outstanding concerns about the proposed patch:

* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.

* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.

Thoughts?
--
Peter Geoghegan

Attachment Content-Type Size
levenshtein_column_hint.v1.2014_03_27.patch.gz application/x-gzip 13.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-27 20:31:36 Re: Useless "Replica Identity: NOTHING" noise from psql \d
Previous Message Andrew Dunstan 2014-03-27 19:04:05 Re: Useless "Replica Identity: NOTHING" noise from psql \d