Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: Re: [DOCS] Document Upper Limit for NAMEDATELEN in pgsql 9.5+
Date: 2016-01-10 16:48:34
Message-ID: CA+TgmobkfHmkO1f9A=nGqDMmhe1tnp=nKf6wuPFcEJPyAJ8kkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2016 at 3:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> [ redirecting from -docs, which isn't the best place to discuss code fixes ]
>
> Whoever thought this was a good idea:
>
> StaticAssertStmt(NAMEDATALEN <= MAX_LEVENSHTEIN_STRLEN,
> "Levenshtein hinting mechanism restricts NAMEDATALEN");
>
> is nuts.
>
> A minimal fix that would not put stumbling blocks in the way of changes
> to NAMEDATALEN is to redefine MAX_LEVENSHTEIN_STRLEN as
> Max(255, NAMEDATALEN). But I wonder why we have to have an arbitrary limit
> in this code at all. I trust nobody here believes this is the only O(N^2)
> algorithm in the backend; the others don't have this sort of thing in
> front of them.

The arbitrary limit in that code is not new, but the Assert is. I
didn't foresee the consequence that it would become harder to change
NAMEDATALEN, and I agree that needs to be fixed. Ripping the
arbitrary limit out does not seem like a particularly good idea to me,
even if we put CHECK_FOR_INTERRUPTS() into that path. Just because
there are other ways to make the backend burn insane amounts of CPU
time doesn't mean that we should try to make it easier.

Actually, though, varstr_levenshtein_less_equal() never gets called
from parse_relation.c with a distance bound of more than four, so it
can't actually go quadratic. There's another call in that file to
varstr_levenshtein(), which in retrospect looks silly, but I'm pretty
sure that could also be changed to use a distance bound of 4 (well,
MAX_FUZZY_DISTNACE + 1) without changing the behavior at all. Given a
constant distance bound, the algorithm is, I believe, only O(max(m,
n)) not O(mn). Knowing that, we could let the internal code just
bypass the length check and only enforce the length restriction when
the code is called from SQL via fuzzystrmatch.

(There are of course other ways to solve the problem, too; this is
just one that came to mind.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-10 17:13:38 Re: ExecGather() + nworkers
Previous Message Robert Haas 2016-01-10 16:32:05 Re: [COMMITTERS] pgsql: Avoid pin scan for replay of XLOG_BTREE_VACUUM