From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Егор Чиндяскин <kyzevan23(at)mail(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mahendrakar s <mahendrakarforpg(at)gmail(dot)com> |
Subject: | Re: Stack overflow issue |
Date: | 2022-08-31 02:38:23 |
Message-ID: | CAMbWs49H7=jV2oHdx_uzGyGUL_Lg4tS799KaLcHCUWa1VwggXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 31, 2022 at 6:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > The upstream recommendation, which seems pretty sane to me, is to
> > simply reject any string exceeding some threshold length as not
> > possibly being a word. Apparently it's common to use thresholds
> > as small as 64 bytes, but in the attached I used 1000 bytes.
>
> On further thought: that coding treats anything longer than 1000
> bytes as a stopword, but maybe we should just accept it unmodified.
> The manual says "A Snowball dictionary recognizes everything, whether
> or not it is able to simplify the word". While "recognizes" formally
> includes the case of "recognizes as a stopword", people might find
> this behavior surprising. We could alternatively do it as attached,
> which accepts overlength words but does nothing to them except
> case-fold. This is closer to the pre-patch behavior, but gives up
> the opportunity to avoid useless downstream processing of long words.
This patch looks good to me. It avoids overly-long words (> 1000 bytes)
going through the stemmer so the stack overflow issue in Turkish stemmer
should not exist any more.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2022-08-31 03:50:39 | Re: [PATCH] Optimize json_lex_string by batching character copying |
Previous Message | Peter Geoghegan | 2022-08-31 01:50:49 | Re: New strategies for freezing, advancing relfrozenxid early |