Boyer-More-Horspool searching LIKE queries

From: Atsushi Ogawa <atsushi(dot)ogawa(at)gmail(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Boyer-More-Horspool searching LIKE queries
Date: 2022-01-11 13:55:16
Message-ID: CAO2GH9Ekvcj19ihQyjk_S6RUjPHADhUfPQBU0gRMs4qVfk12Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have created a patch to enable the Boyer-More-Horspool search
algorithm (B-M-H) for LIKE queries.

B-M-H needs to initialize the skip table and keep it during SQL execution.
In this patch, flinfo->fn_extra is used to keep the skip table.

The conditions under which B-M-H can be used are as follows.

(1) B-M-H in LIKE search supports only single-byte character sets and UTF8.
Multibyte character sets does not support, because it may contain another
characters in the byte sequence. For UTF-8, it works fine, because in
UTF-8 the byte sequence of one character cannot contain another character.

(2) The pattern string should be stable parameter, because B-M-H needs to
keep
the skip table generated from the pattern string during the execution of
the query.

(3) The pattern string should be at least 4 characters.
For example, '%AB%' can use B-M-H.

(4) The first and last character of the pattern string should be '%'.

(5) Characters other than the first and last of the pattern string
should not be '%', '_'. However, escaped characters such as
'\%', '\_' are available.

Also, this patch changes the collation validity check in functions
(such as textlike) to be performed at the first execution of the query,
instead of each function execution.
I have measured the performance with the following query.

---------- ---------- ---------- ---------- ---------- ---------- ----------
SET client_min_messages TO notice;

\timing

DO $$
DECLARE
cnt integer := 0;
total integer := 0;
BEGIN
FOR i IN 1..500 LOOP
select count(*) into cnt
from pg_catalog.pg_description d
where d.description like '%greater%';

total := total + cnt;
END LOOP;

RAISE NOTICE 'TOTAL: %', total;
END
$$
;
---------- ---------- ---------- ---------- ---------- ---------- ----------

Result
Without patch: 257.504ms
With patch: 191.638ms

Regards,
Atsushi Ogawa

Attachment Content-Type Size
like_bmh.patch application/octet-stream 10.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-01-11 14:28:43 Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
Previous Message Amit Kapila 2022-01-11 13:28:19 Re: [Ext:] Re: Stream Replication not working