Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-11-20 18:30:44
Message-ID: CAM3SWZStRMTxbow+j70DHPJ4VuVkf2=fvjY0WUH9zbd4GLOctA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 20, 2014 at 7:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> In general, I think the cost of a bad suggestion is much lower than
>> the benefit of a good one. You seem to be suggesting that they're
>> equal. Or that they're equally likely in an organic situation. In my
>> estimation, this is not the case at all.
>
> The way I see it, the main cost of a bad suggestion is that it annoys
> the user with clutter which they may brand as "stupid". Think about
> how much vitriol has been spewed over the years against progress bars
> (or estimated completion) times that don't turn out to mirror reality.

Well, you can judge the quality of the suggestion immediately. I
imagined a mechanism that gives a little bit more than the minimum
amount of guidance for things like contractions/abbreviations.

> Microsoft has gotten more cumulative flack about their inaccurate
> progress bars over the years than they would have for dropping an
> elevator on a cute baby.

I haven't used a more recent version of Windows than Windows Vista,
but I'm pretty sure that they kept it up.

>> I'm curious about your thoughts on the compromise of a ramped up
>> distance threshold to apply a test for the absolute quality of a
>> match. I think that the fact that git gives bad suggestions with terse
>> strings tells us a lot, though. Note that unlike git, with terse
>> strings we may well have a good deal more equidistant matches, and as
>> soon as the number of would-be matches exceeds 2, we actually give no
>> matches at all. So that's an additional protection against poor
>> matches with terse strings.
>
> I don't know what you mean by a ramped-up distance threshold, exactly.
> I think it's good for the distance threshold to be lower for small
> strings and higher for large ones. I think I'm somewhat open to
> negotiation on the details, but I think any system that's going to
> suggest "quantity" for "tit" is going too far.

I mean the suggestion of raising the cost threshold more gradually,
not as a step function of the number of characters in the string [1]
where it's either over 6 characters and must pass the 50% test, or
isn't and has no absolute quality test. The exact modification I
described will FWIW remove the "quantity" for "qty" suggestion, as
well as all the similar suggestions that you found objectionable (like
"tit" also offering a suggestion of "quantity").

If you look at the regression tests, none of the sensible suggestions
are lost (some would be by an across the board 50% absolute quality
threshold, as I previously pointed out [2]), but all the bad ones are.
I attach failed regression test output showing the difference between
the previous expected values, and actual values with that small
modification - it looks like most or all bad cases are now fixed.

> If the user types
> "qty" when they meant "quantity", they probably don't really need the
> hint, because they're going to say to themselves "wait, I guess I
> didn't abbreviate that". The time when they need the hint is when
> they typed "quanttiy", because it's quite possible to read a query
> with that sort of typo multiple times and not realize that you've made
> one.

I agree that that's a more important case.

> In other words, I think there's value in trying to clue somebody in
> when they've made a typo, but not when they've made a think-o. We
> won't be able to do the latter accurately enough to make it more
> useful than annoying.

That's certainly true; I think that we only disagree about the exact
point at which we enter the think-o correction business.

[1] http://www.postgresql.org/message-id/CAM3SWZT+7hH29Go6ZuY2OrCS40=6yPVM_nt9NjfovP3XwjixDw@mail.gmail.com
[2] http://www.postgresql.org/message-id/CAM3SWZTSGokNhT8rK+0Eed7spNJg4pAdMbqqYi0FH9bWcNvTGA@mail.gmail.com
--
Peter Geoghegan

Attachment Content-Type Size
regression.diffs application/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2014-11-20 18:56:58 Re: Functions used in index definitions shouldn't be changed
Previous Message Peter Geoghegan 2014-11-20 18:08:49 Re: Doing better at HINTing an appropriate column within errorMissingColumn()