pg_kazsearch: Full-text search extension for Kazakh language

From: Darkhan <darkhanahmetov2005(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: pg_kazsearch: Full-text search extension for Kazakh language
Date: 2026-04-05 13:32:37
Message-ID: CAOW9cEpjUV0fG6u6m86vt8RJOBLymys=k33DWzgEP+0SnXhZGA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I built pg_kazsearch, a PostgreSQL extension that adds full-text search
support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or
stop word list available in PostgreSQL, so anyone searching Kazakh text is
stuck with trigram matching or application-level workarounds.

Kazakh is agglutinative — a single word can carry 5-6 suffixes, which makes
standard search approaches miss most relevant results. pg_kazsearch
provides a custom Kazakh stemmer (core written in Rust), a stop word list,
and a text search dictionary that plugs into the standard PostgreSQL FTS
infrastructure — GIN indexes, ts_rank, phrase search all work out of the
box.

I tested it on a dataset of 3,000 real Kazakh news articles. On the same
query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search,
with a 23% improvement in recall overall.

You can install it with a single command via deb package or Docker image,
no compilation needed.

Repo: https://github.com/darkhanakh/pg-kazsearch

I'd appreciate any feedback, especially from anyone working on text search
internals or with experience supporting non-Latin or agglutinative
languages in PostgreSQL.

Thanks, Darkhan

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2026-04-07 09:15:08 configure && --with
Previous Message David G. Johnston 2026-04-03 17:26:19 Re: Documentation weirdness