Quick Links

Re: speed up verifying UTF-8

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speed up verifying UTF-8
Date:	2021-06-03 14:41:41
Message-ID:	CAM-w4HMTHARzthg-3j1GnXUFTer0mLVXt8voPbA+iF68OSvHTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I haven't looked at the surrounding code. Are we processing all the
COPY data in one long stream or processing each field individually? If
we're processing much more than 128 bits and happy to detect NUL
errors only at the end after wasting some work then you could hoist
that has_zero check entirely out of the loop (removing the branch
though it's probably a correctly predicted branch anyways).

Do something like:

zero_accumulator = zero_accumulator & next_chunk

in the loop and then only at the very end check for zeros in that.

In response to

Re: speed up verifying UTF-8 at 2021-06-03 14:33:59 from Greg Stark

Responses

Re: speed up verifying UTF-8 at 2021-06-03 15:33:21 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nitin Jadhav	2021-06-03 14:45:09	Re: Multi-Column List Partitioning
Previous Message	Greg Stark	2021-06-03 14:33:59	Re: speed up verifying UTF-8