Re: Stack overflow issue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Егор Чиндяскин <kyzevan23(at)mail(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mahendrakar s <mahendrakarforpg(at)gmail(dot)com>
Subject: Re: Stack overflow issue
Date: 2022-08-30 22:57:06
Message-ID: 3802215.1661900226@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word. Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word". While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising. We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold. This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.

regards, tom lane

Attachment Content-Type Size
limit-length-of-strings-passed-to-snowball-2.patch text/x-diff 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-08-30 23:35:54 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message David Rowley 2022-08-30 22:40:43 Re: Reducing the chunk header sizes on all memory context types