Quick Links

Re: Speeding up text_position_next with multibyte encodings

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	John Naylor <jcnaylor(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Speeding up text_position_next with multibyte encodings
Date:	2019-01-28 21:50:42
Message-ID:	20190128215042.GJ26761@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jan 25, 2019 at 04:33:54PM +0200, Heikki Linnakangas wrote:
> On 15/01/2019 02:52, John Naylor wrote:
> >The majority of cases are measurably faster, and the best case is at
> >least 20x faster. On the whole I'd say this patch is a performance win
> >even without further optimization. I'm marking it ready for committer.
>
> I read through the patch one more time, tweaked the comments a little bit,
> and committed. Thanks for the review!
>
> I did a little profiling of the worst case, where this is slower than the
> old approach. There's a lot of function call overhead coming from walking
> the string with pg_mblen(). That could be improved. If we inlined pg_mblen()
> into loop, it becomes much faster, and I think this code would be faster
> even in the worst case. (Except for the very worst cases, where hash table
> with the new code happens to have a collision at a different point than
> before, but that doesn't seem like a fair comparison.)
>
> I think this is good enough as it is, but if I have the time, I'm going to
> try optimizing the pg_mblen() loop, as well as similar loops e.g. in
> pg_mbstrlen(). Or if someone else wants to give that a go, feel free.

It might be valuable to just inline the UTF8 case.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Re: Speeding up text_position_next with multibyte encodings at 2019-01-25 14:33:54 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2019-01-28 21:51:11	Re: Proposed refactoring of planner header files
Previous Message	Bruce Momjian	2019-01-28 21:47:25	Re: backslash-dot quoting in COPY CSV