Re: daitch_mokotoff module

From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: Dag Lem <dag(at)nimrod(dot)no>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: daitch_mokotoff module
Date: 2023-01-11 20:40:31
Message-ID: CACowWR0Lg+49Z4ncN2-U0-fUYLpAVQYupRr=0UaSLYxgrAWVHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 2, 2023 at 2:03 PM Dag Lem <dag(at)nimrod(dot)no> wrote:

> I also improved on the documentation example (using Full Text Search).
> AFAIK you can't make general queries like that using arrays, however in
> any case I must admit that text arrays seem like more natural building
> blocks than space delimited text here.

This is a fun addition to fuzzystrmatch.

While it's a little late in the game, I'll just put it out there:
daitch_mokotoff() is way harder to type than soundex_dm(). Not sure
how you feel about that.

On the documentation, I found the leap directly into the tsquery
example a bit too big. Maybe start with a very simple example,

--
dm=# SELECT daitch_mokotoff('Schwartzenegger'),
daitch_mokotoff('Swartzenegger');

daitch_mokotoff | daitch_mokotoff
-----------------+-----------------
{479465} | {479465}
--

Then transition into a more complex example that illustrates the GIN
index technique you mention in the text, but do not show:

--
CREATE TABLE dm_gin (source text, dm text[]);

INSERT INTO dm_gin (source) VALUES
('Swartzenegger'),
('John'),
('James'),
('Steinman'),
('Steinmetz');

UPDATE dm_gin SET dm = daitch_mokotoff(source);

CREATE INDEX dm_gin_x ON dm_gin USING GIN (dm);

SELECT * FROM dm_gin WHERE dm && daitch_mokotoff('Schwartzenegger');
--

And only then go into the tsearch example. Incidentally, what does the
tsearch approach provide that the simple GIN approach does not?
Ideally explain that briefly before launching into the example. With
all the custom functions and so on it's a little involved, so maybe if
there's not a huge win in using that approach drop it entirely?

ATB,
P

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-11 20:41:45 Re: logical decoding and replication of sequences, take 2
Previous Message Justin Pryzby 2023-01-11 20:38:34 Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode