Quick Links

Re: speed up verifying UTF-8

From:	Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: speed up verifying UTF-8
Date:	2021-07-26 11:55:29
Message-ID:	CAB=Je-Eqcuz2MxuA0QU-6qLDrG0bvRB+UBj7JoFekM1fxk_H_g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Just wondering, do you have the code in a GitHub/Gitlab branch?

>+ utf8_advance(s, state, len);
>+
>+ /*
>+ * If we saw an error during the loop, let the caller handle it. We treat
>+ * all other states as success.
>+ */
>+ if (state == ERR)
>+ return 0;

Did you mean state = utf8_advance(s, state, len); there? (reassign state
variable)

>I wanted to try different strides for the DFA

Does that (and "len >= 32" condition) mean the patch does not improve
validation of the shorter strings (the ones less than 32 bytes)?
It would probably be nice to cover them as well (e.g. with 4 or 8-byte
strides)

Vladimir

In response to

Re: speed up verifying UTF-8 at 2021-07-26 11:09:00 from John Naylor

Responses

Re: speed up verifying UTF-8 at 2021-07-26 12:56:52 from John Naylor
Re: speed up verifying UTF-8 at 2021-07-26 12:58:37 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ibrar Ahmed	2021-07-26 12:52:39	Re: 2021-07 CF now in progress
Previous Message	John Naylor	2021-07-26 11:09:00	Re: speed up verifying UTF-8