Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-07-02 22:51:08
Message-ID: CAM3SWZTzQO=OY4jmfB-65ieFie8iHUkDErK-0oLJETm8dSrSpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 29, 2014 at 7:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Although printing all candidates seems to be what's preferred by
> existing systems with similar facilities, I can see the point that
> constructing the message in a translatable fashion might be difficult.
> So personally I'd be willing to abandon insistence on that. I still
> think though that printing candidates with very large distances
> would be unhelpful.

Attached revision factors in everyone's concerns here, I think.

I've addressed your concern about the closeness of the match proposed
in the HINT - the absolute as opposed to relative quality of the
match. There is a normalized distance threshold that must always be
exceeded to prevent ludicrous suggestions. This works along similar
lines to those sketched by Robert. Furthermore, I've made it
occasionally possible to see 2 suggestions, when they're equally
distant and when each suggestion comes from a different range table
entry. However, if the two best suggestions (overall or within an RTE)
come from within the same RTE, then that RTE is ignored for the
purposes of picking a suggestion (although the lowest observed
distance from an ignored RTE may still be used as the distance for
later RTEs to beat to get their attributes suggested in the HINT).

The idea here is that this quality-bar for suggestions doesn't come at
the cost of ignoring my concern about the presumably somewhat common
case where there is an unqualified and therefore ambiguous column
reference that happens to also be misspelled. An ambiguous column
reference and an incorrectly spelled column name are both very common,
and so it seems likely that momentary lapses where the user gets both
things wrong at once are also common. We do all this without going
overboard, since as outlined by Robert, when there are 3 or more
equally distant candidates (even if they all come from different
RTEs), we give no HINT at all. The big picture here is to make mental
context switches cheap when writing ad-hoc queries in psql.

A lot of the HINTs that popped up in the regression tests that seemed
kind of questionable no longer appear. These new measures make the
coding somewhat more complex than that of the initial version,
although overall the parser code added by this patch is almost
entirely confined to code paths concerned only with producing
diagnostic messages to help users.

--
Peter Geoghegan

Attachment Content-Type Size
levenshtein_column_hint.v2.2014_07_02.patch.gz application/x-gzip 16.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-07-02 23:27:13 Re: Ancient bug in formatting.c/to_char()
Previous Message Tom Lane 2014-07-02 22:02:52 Re: Can simplify 'limit 1' with slow function?