Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-12-02 21:11:05
Message-ID: CA+TgmoaxzagWBzO1ybTi8VTYW6Q0uBpN7ea05xW5p5N3hNX-Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 25, 2014 at 7:13 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Alright, so let me summarize what I think are the next steps in
> working towards getting this patch committed. I should produce a new
> revision which:
>
> * Uses default costings.
>
> * Applies a generic final quality check that enforces a distance of no
> greater than 50% of the total string size. (The use of default
> costings removes any reason to continue to do this)
>
> * Work through Robert's suggestions on other aspects that need work
> [1], most of which I already agreed to.

Sounds good so far.

> What is unclear is whether or not I should continue to charge extra
> for non-matching user supplied alias (and, I think more broadly,
> consider multiple RTEs iff the user did use an alias) - Robert was
> skeptical, but didn't seem to have made his mind up. I still think I
> should cost things based on aliases, and consider multiple RTEs even
> when the user supplied an alias (the penalty should just be a distance
> of 1 and not 3, though, in light of other changes to the
> weighing/costing). If I don't hear anything in the next day or two,
> I'll more or less preserve aliases-related aspects of the patch.

Basically, the case in which I think it's helpful to issue a
suggestion here is when the user has used the table name rather than
the alias name. I wonder if it's worth checking for that case
specifically, in lieu of what you've done here, and issuing a totally
different hint in that case ("HINT: You must refer to this as column
as "prime_minister.id" rather than "cameron.id").

Another idea, which I think I like less well, is to check the
Levenshtein distance between the allowed alias and the entered alias
and, if that's within the half-the-shorter-length threshold, consider
possible matches from that RTE, charge the distance between the
correct alias and the entered alias as a penalty to each potential
column match.

What I think won't do is to look at a situation where the user has
entered automobile.id and suggest that maybe they meant student.iq, or
even student.id. The amount of difference between the names has got to
matter for the RTE names, just as it does for the column names.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-12-02 21:11:21 Re: superuser() shortcuts
Previous Message Emre Hasegeli 2014-12-02 21:09:27 Re: Selectivity estimation for inet operators