Re: speed up verifying UTF-8

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speed up verifying UTF-8
Date: 2021-07-19 01:26:47
Message-ID: CAFBsxsHu4WrdQHbr73+aez4wjzLHF-LYihutn0-Huex8vNX5zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> On Fri, Jul 16, 2021 at 1:44 AM Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:
> >
> > Have you considered shift-based DFA for a portable implementation
https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725 ?
>
> I did consider some kind of DFA a while back and it was too slow.

I took a closer look at this "shift-based DFA", and it seemed pretty
straightforward to implement this on top of my DFA attempt from some months
ago. The DFA technique is not a great fit with our API, since we need to
return how many bytes we found valid. On x86 (not our target for the
fallback, but convenient to test) all my attempts were either worse than
HEAD in multiple cases, or showed no improvement for the important ASCII
case. On Power8, it's more compelling, and competitive with v14, so I'll
characterize it on that platform as I describe the patch series:

0001 is a pure DFA, and has decent performance on multibyte, but terrible
on ascii.
0002 dispatches on the leading byte category, unrolls the DFA loop
according to how many valid bytes we need, and only checks the DFA state
afterwards. It's good on multibyte (3-byte, at least) but still terrible on
ascii.
0003 adds a 1-byte ascii fast path -- while robust on all inputs, it still
regresses a bit on ascii.
0004 uses the same 8-byte ascii check as previous patches do.
0005 and 0006 use combinations of 1- and 8-byte ascii checks similar to in
v17.

0005 seems the best on Power8, and is very close to v4. FWIW, v14's
measurements seem lucky and fragile -- if I change any little thing, even

- return -1;
+ return 0;

it easily loses 100-200ms on non-pure-ascii tests. That said, v14 still
seems the logical choice, unless there is some further tweak on top of v17
or v18 that gives some non-x86 platform a significant boost.

Power8, gcc 4.8:

HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
2944 | 1523 | 871 | 1473 | 1509

v18-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1257 | 1681 | 1385 | 1744 | 2018

v18-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
951 | 1381 | 1217 | 1469 | 1172

v18-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
911 | 1111 | 942 | 1112 | 865

v18-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
987 | 730 | 222 | 1325 | 2306

v18-0005:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
962 | 664 | 180 | 928 | 1179

v18-0006:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
908 | 663 | 244 | 1026 | 1464

and for comparison,

v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
888 | 607 | 179 | 777 | 1328

v17-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1205 | 662 | 180 | 767 | 1256

Macbook, clang 12:

HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
974 | 691 | 370 | 456 | 526

v18-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1334 | 2713 | 2802 | 2665 | 2541

v18-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
733 | 1212 | 1064 | 1034 | 1007

v18-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
653 | 560 | 370 | 420 | 465

v18-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
574 | 402 | 88 | 584 | 1033

v18-0005:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1345 | 730 | 334 | 578 | 909

v18-0006:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
674 | 485 | 153 | 594 | 989

and for comparison,

v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
674 | 346 | 78 | 309 | 504

v17-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
516 | 324 | 78 | 331 | 544

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v18-0001-Use-pure-DFA.patch application/octet-stream 13.5 KB
v18-0002-Unroll-loop-in-DFA.patch application/octet-stream 1.7 KB
v18-0004-Check-ascii-8-bytes-at-a-time-with-bitwise-opera.patch application/octet-stream 3.4 KB
v18-0003-Add-ascii-fast-path-before-resorting-to-DFA.patch application/octet-stream 1004 bytes
v18-0005-Do-1-byte-ascii-check-if-8-byte-check-fails.patch application/octet-stream 785 bytes
v18-0006-Do-8-byte-check-only-if-1-byte-check-succeeds.patch application/octet-stream 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-07-19 01:51:36 Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error
Previous Message Kyotaro Horiguchi 2021-07-19 01:16:18 Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?