Re: tsvector not giving expected results on one host

From: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>
To: Dan Langille <dan(at)langille(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: tsvector not giving expected results on one host
Date: 2022-12-21 17:31:35
Message-ID: CAF4Au4zY9nA27diCUgg_=e+R5y5Xj0hLiTzh9yLSzw4kTpyuOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I

On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov(at)postgrespro(dot)ru> wrote:
>
> Dan,
>
> it's always good to specify configuration name in a query to avoid
> recheck, since
> websearch_to_tsquery(regconfig, text) is immutable, while
> websearch_to_tsquery(text) is stable.

immutable function calculates once in planning time, but stable
function calculates during running time,
so the difference may be very big depending on how many tuples found.

>
> See the difference:
>
> [local]:5433 oleg(at)oleg=# explain (analyze,costs off) select title
> from apod where websearch_to_tsquery('simple','galaxies') @@ fts;
> QUERY PLAN
> ---------------------------------------------------------------------------------------
> Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
> Recheck Cond: ('''galaxies'''::tsquery @@ fts)
> -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
> rows=0 loops=1)
> Index Cond: (fts @@ '''galaxies'''::tsquery)
> Planning Time: 0.134 ms
> Execution Time: 0.022 ms
> (6 rows)
>
> Time: 0.369 ms
> [local]:5433 oleg(at)oleg=# explain (analyze,costs off) select title
> from apod where websearch_to_tsquery('galaxies') @@ fts;
> QUERY PLAN
> -----------------------------------------------------------------------------------------
> Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
> Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
> Heap Blocks: exact=276
> -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
> rows=493 loops=1)
> Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
> Planning Time: 0.125 ms
> Execution Time: 1.518 ms
> (7 rows)
>
> On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan(at)langille(dot)org> wrote:
> >
> > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> > >>> "Dan Langille" <dan(at)langille(dot)org> writes:
> > >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored
> > >>>
> > >>> That is not likely to play well with this:
> > >>>
> > >>>> freshports.org=> show default_text_search_config ;
> > >>>> default_text_search_config
> > >>>> ----------------------------
> > >>>> pg_catalog.simple
> > >>>
> > >>> because "english" and "simple" will stem words differently.
> > >>>
> > >>> regression=# select websearch_to_tsquery('english', 'example');
> > >>> websearch_to_tsquery
> > >>> ----------------------
> > >>> 'exampl'
> > >>> (1 row)
> > >>>
> > >>> regression=# select websearch_to_tsquery('simple', 'example');
> > >>> websearch_to_tsquery
> > >>> ----------------------
> > >>> 'example'
> > >>> (1 row)
> > >>>
> > >>> If what is in your tsvector is 'exampl', then only the first of
> > >>> these will match. So IMO the question is not "why is it failing
> > >>> on prod?", it's "how the heck did it work on the other machine?".
> > >>> You won't get nice results if websearch_to_tsquery is using a
> > >>> different TS configuration than to_tsvector did.
> > >>
> > >> I think this shows why we are getting the results we see. Credit to ch
> > >> on IRC for asking this question.
> > >>
> > >> The problem host:
> > >>
> > >> freshports.org=> select websearch_to_tsquery('example');
> > >> websearch_to_tsquery
> > >> ----------------------
> > >> 'example'
> > >> (1 row)
> > >
> > > Ahh, this explains the differences and as to why it works where it shouldn't?
> > >
> > > freshports.org=> select setting, source from pg_settings where name =
> > > 'default_text_search_config';
> > > setting | source
> > > -------------------+---------
> > > pg_catalog.simple | default
> > > (1 row)
> > >
> > >
> > >>
> > >> The hosts on which this search works
> > >>
> > >> freshports.devgit=# select websearch_to_tsquery('example');
> > >> websearch_to_tsquery
> > >> ----------------------
> > >> 'exampl'
> > >> (1 row)
> > >
> > >
> > > freshports.devgit=# select setting, source from pg_settings where name
> > > = 'default_text_search_config';
> > > setting | source
> > > --------------------+--------------------
> > > pg_catalog.english | configuration file
> > > (1 row)
> > >
> > >
> > > At least now I know what I can play with to get all hosts in sync.
> >
> > Here we go, on the problem database, create a new field, based on simple, not english.
> >
> > ALTER TABLE public.ports
> > ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage, '/'::text, ' '::text))) stored;
> >
> > Index it:
> >
> > CREATE INDEX ports_pkgmessage_textsearchable3_idx
> > ON public.ports USING gin
> > (pkgmessage_textsearchable3)
> > TABLESPACE pg_default;
> > CREATE INDEX
> >
> > query it:
> >
> > freshports.org=> SELECT id as port_id, element_pathname(element_id)
> > FROM ports
> > WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example');
> > port_id | element_pathname
> > ---------+----------------------------------------------------------------------
> > 34126 | /ports/head/security/pond
> > 74559 | /ports/branches/2015Q3/emulators/linux_base-c6
> > 60310 | /ports/branches/2020Q4/www/gitlab-ce
> > 38345 | /ports/head/www/gitlab
> > 46842 | /ports/branches/2018Q1/mail/postfix-sasl
> > 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
> >
> > Success. Thank you Mr Lane.
> >
> > --
> > Dan Langille
> > dan(at)langille(dot)org
> >
> >
>
>
> --
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alicja Kucharczyk 2022-12-21 17:41:56 Re: Regd. the Implementation of Wallet (in Oracle) config equivalent in postgreSQL whilst the database migration
Previous Message vignesh C 2022-12-21 17:21:51 Re: Support logical replication of DDLs