text_position worst case runtime

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: text_position worst case runtime
Date: 2006-05-19 00:17:19
Message-ID: e4j2qg$q3q$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The function

static int32 text_position(text *t1, text *t2, int matchnum)

defined in src/backend/utils/adt/varlena.c uses repeated calls to strncmp (or
pg_wchar_strncmp) to find the location of the pattern in the text. The worst
case runtime for such an approach is O(n*m) where n and m are the lengths of the
pattern and text. The best case would be O(n), I guess, because it only takes n
comparisons to find the pattern at the very start of the text. I'm not sure how
to determine the average case runtime, because it depends what your data looks
like on average.

The fastest worst-case algorithmic solutions (boyer-moore and similar) have an
O(n+m) worst-case runtime.

In practice, the current iterative strncmp solution may be faster than the
boyer-moore solution for average data, because the data may not be constructed
in such a way as to trigger worst-case or nearly worst-case run times. The
current solution also has the advantage of not requiring a palloc of O(n) space
for the pattern preprocessing that boyer-moore requires.

My question is, would the core development team entertain the idea of changing
this function if I supplied the new code?

Thanks,

mark

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2006-05-19 00:23:03 Re: PL/pgSQL 'i = i + 1' Syntax
Previous Message Robert Treat 2006-05-19 00:14:59 Re: [OT] MySQL is bad, but THIS bad?