Re: Improving FTS for Greek

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving FTS for Greek
Date: 2023-06-06 22:30:55
Message-ID: AA782163-36A3-46C3-8775-84B34C567471@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 03.06.23 19:47, Florents Tselai wrote:
>> There’s another previous relevant patch [0] but was never merged. I’ve included these stop words and added some more (info in README.md).
>> For my personal projects looks like it yields much better results.
>> I’d like some feedback on the extension ; particularly on the installation infra (I’m not sure I’ve handled properly the permissions in the .sql files)
>> I’ll then try to make a .patch for this.
>
> The open question at the previous attempt was that it wasn't clear what the upstream source or long-term maintenance of the stop words list would be. If it's just a personally composed list, then it's okay if you use it yourself, but for including it into PostgreSQL it ought to come from a reputable non-individual source like snowball.

I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be considered reputable enough ?

0 https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip (see greek.stop file in the archive)

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2023-06-07 00:08:51 doc patch: note AttributeRelationId passed to FDW validator function
Previous Message Thomas Munro 2023-06-06 22:26:07 Re: Let's make PostgreSQL multi-threaded