Quick Links

Re: Stack overflow issue

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Егор Чиндяскин <kyzevan23(at)mail(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mahendrakar s <mahendrakarforpg(at)gmail(dot)com>
Subject:	Re: Stack overflow issue
Date:	2022-08-30 22:57:06
Message-ID:	3802215.1661900226@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> The upstream recommendation, which seems pretty sane to me, is to
> simply reject any string exceeding some threshold length as not
> possibly being a word. Apparently it's common to use thresholds
> as small as 64 bytes, but in the attached I used 1000 bytes.

On further thought: that coding treats anything longer than 1000
bytes as a stopword, but maybe we should just accept it unmodified.
The manual says "A Snowball dictionary recognizes everything, whether
or not it is able to simplify the word". While "recognizes" formally
includes the case of "recognizes as a stopword", people might find
this behavior surprising. We could alternatively do it as attached,
which accepts overlength words but does nothing to them except
case-fold. This is closer to the pre-patch behavior, but gives up
the opportunity to avoid useless downstream processing of long words.

regards, tom lane

Attachment	Content-Type	Size
limit-length-of-strings-passed-to-snowball-2.patch	text/x-diff	1.2 KB

In response to

Re: Stack overflow issue at 2022-08-30 15:02:38 from Tom Lane

Responses

Re: Stack overflow issue at 2022-08-31 02:38:23 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2022-08-30 23:35:54	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message	David Rowley	2022-08-30 22:40:43	Re: Reducing the chunk header sizes on all memory context types