| From: | Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info> |
|---|---|
| To: | Darkhan <darkhanahmetov2005(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
| Subject: | Re: pg_kazsearch: Full-text search extension for Kazakh language |
| Date: | 2026-04-08 14:42:21 |
| Message-ID: | 34cf74ff-5466-44e0-9a3f-e626708f893a@anayrat.info |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On 4/5/26 3:32 PM, Darkhan wrote:
> Hi all,
>
> I built pg_kazsearch, a PostgreSQL extension that adds full-text search
> support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or
> stop word list available in PostgreSQL, so anyone searching Kazakh text is
> stuck with trigram matching or application-level workarounds.
>
> Kazakh is agglutinative — a single word can carry 5-6 suffixes, which makes
> standard search approaches miss most relevant results. pg_kazsearch
> provides a custom Kazakh stemmer (core written in Rust), a stop word list,
> and a text search dictionary that plugs into the standard PostgreSQL FTS
> infrastructure — GIN indexes, ts_rank, phrase search all work out of the
> box.
>
> I tested it on a dataset of 3,000 real Kazakh news articles. On the same
> query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search,
> with a 23% improvement in recall overall.
>
> You can install it with a single command via deb package or Docker image,
> no compilation needed.
>
> Repo: https://github.com/darkhanakh/pg-kazsearch
>
> I'd appreciate any feedback, especially from anyone working on text search
> internals or with experience supporting non-Latin or agglutinative
> languages in PostgreSQL.
>
> Thanks, Darkhan
>
Hello,
Thanks for your work.
I don't know anything about Kazakh.
But have you try to add it to Snowball stemmer [1] ?
As Postgres uses it, you have more chances to have Kazakh
supported in future versions.
1: https://github.com/snowballstem/snowball
--
Adrien NAYRAT
https://pro.anayrat.info
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Darkhan | 2026-04-08 14:55:26 | Re: pg_kazsearch: Full-text search extension for Kazakh language |
| Previous Message | Matthias Apitz | 2026-04-08 11:38:21 | Re: configure && --with |