From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speed up verifying UTF-8 |
Date: | 2021-06-30 16:54:23 |
Message-ID: | CAFBsxsGZ_ssdVmOK5qbcO5on87ByyDvW3APRohR=kCfb8Z3XVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 30, 2021 at 7:18 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Hmm, there's one more simple trick we can do: We can have a separate
> fast-path version of the loop when there are at least 8 bytes of input
> left, skipping all the length checks. With that:
Good idea, and the numbers look good on Power8 / gcc 4.8 as well:
master:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
2951 | 1521 | 871 | 1473 | 1508
v13:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
949 | 642 | 203 | 1046 | 1818
v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
887 | 607 | 179 | 776 | 1325
I don't think the new structuring will pose any challenges for rebasing
0002, either. This might need some experimentation, though:
+ * Subroutine of pg_utf8_verifystr() to check on char. Returns the length
of the
+ * character at *s in bytes, or 0 on invalid input or premature end of
input.
+ *
+ * XXX: could this be combined with pg_utf8_verifychar above?
+ */
+static inline int
+pg_utf8_verify_one(const unsigned char *s, int len)
It seems like it would be easy to have pg_utf8_verify_one in my proposed
pg_utf8.h header and replace the body of pg_utf8_verifychar with it.
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2021-06-30 17:13:16 | Re: [PATCH] Make jsonapi usable from libpq |
Previous Message | David Christensen | 2021-06-30 16:53:03 | [PATCH] pgbench: add multiconnect option |