Re: Stack overflow issue

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Егор Чиндяскин <kyzevan23(at)mail(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mahendrakar s <mahendrakarforpg(at)gmail(dot)com>
Subject: Re: Stack overflow issue
Date: 2022-08-31 02:38:23
Message-ID: CAMbWs49H7=jV2oHdx_uzGyGUL_Lg4tS799KaLcHCUWa1VwggXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > The upstream recommendation, which seems pretty sane to me, is to
> > simply reject any string exceeding some threshold length as not
> > possibly being a word. Apparently it's common to use thresholds
> > as small as 64 bytes, but in the attached I used 1000 bytes.
>
> On further thought: that coding treats anything longer than 1000
> bytes as a stopword, but maybe we should just accept it unmodified.
> The manual says "A Snowball dictionary recognizes everything, whether
> or not it is able to simplify the word". While "recognizes" formally
> includes the case of "recognizes as a stopword", people might find
> this behavior surprising. We could alternatively do it as attached,
> which accepts overlength words but does nothing to them except
> case-fold. This is closer to the pre-patch behavior, but gives up
> the opportunity to avoid useless downstream processing of long words.

This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-08-31 03:50:39 Re: [PATCH] Optimize json_lex_string by batching character copying
Previous Message Peter Geoghegan 2022-08-31 01:50:49 Re: New strategies for freezing, advancing relfrozenxid early