Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-11-20 19:03:49
Message-ID: CA+TgmoY0jthxThGLMfzkyWR74Mg1+c0_1tf8zRjd4YRJWPftLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 20, 2014 at 1:30 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I mean the suggestion of raising the cost threshold more gradually,
> not as a step function of the number of characters in the string [1]
> where it's either over 6 characters and must pass the 50% test, or
> isn't and has no absolute quality test. The exact modification I
> described will FWIW remove the "quantity" for "qty" suggestion, as
> well as all the similar suggestions that you found objectionable (like
> "tit" also offering a suggestion of "quantity").
>
> If you look at the regression tests, none of the sensible suggestions
> are lost (some would be by an across the board 50% absolute quality
> threshold, as I previously pointed out [2]), but all the bad ones are.
> I attach failed regression test output showing the difference between
> the previous expected values, and actual values with that small
> modification - it looks like most or all bad cases are now fixed.

That does seem to give better results, but it still seems awfully
complicated. If we just used Levenshtein with all-default cost
factors and a distance cap equal to Max(strlen(what_user_typed),
strlen(candidate_match), 3), what cases that you think are important
would be harmed?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-20 19:11:31 Re: Functions used in index definitions shouldn't be changed
Previous Message Robert Haas 2014-11-20 18:57:25 Re: group locking: incomplete patch, just for discussion