Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-12-22 13:50:43
Message-ID: CA+TgmoYsSC0Wa1b3tLKPJ-3Ew=98JFsQgheGYc49p2TU36sMFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 20, 2014 at 7:30 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Sat, Dec 20, 2014 at 3:17 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Attached patch implements this scheme.
>
> I had another thought: "NAMEDATALEN + 1" is a better representation of
> "infinity" for matching purposes than INT_MAX. I probably should have
> made that change, too. It would then not have been necessary to
> "#include <limits.h>". I think that this is a useful
> belt-and-suspenders precaution against integer overflow. It almost
> certainly won't matter, since it's very unlikely that the best match
> within an RTE will end up being a dropped column, but we might as well
> do it that way (Levenshtein distance is costed in multiples of code
> point changes, but the maximum density is 1 byte per codepoint).

Good idea.

Looking over the latest patch, I think we could simplify the code so
that you don't need multiple FuzzyAttrMatchState objects. Instead of
creating a separate one for each RTE and then merging them, just have
one. When you find an inexact-RTE name match, set a field inside the
FuzzyAttrMatchState -- maybe with a name like rte_penalty -- to the
Levenshtein distance between the RTEs. Then call scanRTEForColumn()
and pass down the same state object. Now let
updateFuzzyAttrMatchState() work out what it needs to do based on the
observed inter-column distance and the currently-in-force RTE penalty.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-22 14:05:24 Re: moving from contrib to bin
Previous Message Robert Haas 2014-12-22 13:31:14 Re: Table-level log_autovacuum_min_duration