Re: [PATCH] Optimize json_lex_string by batching character copying

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying
Date: 2022-07-06 05:10:20
Message-ID: CAFBsxsGhaR2KQ5eisaK=6Vm60t=axhD8Ckj1qFoCH1pktZi+2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 25, 2022 at 8:05 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> > I tried your patch with:
> >
> > DROP TABLE IF EXISTS json_as_text;
> > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1, 100);
> > VACUUM FREEZE json_as_text;
> >
> > SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me';
> >
> > Which the patch improves from 846ms to 754ms (best of three). A bit smaller
> > than your improvement, but still nice.
> >
> >
> > I think your patch doesn't quite go far enough - we still end up looping for
> > each character, have the added complication of needing to flush the
> > "buffer". I'd be surprised if a "dedicated" loop to see until where the string
> > last isn't faster. That then obviously could be SIMDified.
>
> A naive implementation (attached) of that gets me down to 706ms.

Taking this a step further, I modified json_lex and json_lex_string to
use a const end pointer instead of maintaining the length (0001). The
observed speedup is small enough that it might not be real, but the
code is simpler this way, and it makes 0002 and 0003 easier to reason
about. Then I modified your patch to do the same (0002). Hackish SSE2
support is in 0003.

To exercise the SIMD code a bit, I added a second test:

DROP TABLE IF EXISTS long_json_as_text;
CREATE TABLE long_json_as_text AS
with long as (
select repeat(description, 10) from pg_description pd
)
SELECT (select json_agg(row_to_json(long)) as t from long) from
generate_series(1, 100);
VACUUM FREEZE long_json_as_text;

SELECT 1 FROM long_json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me';

With this, I get (turbo disabled, min of 3):

short test:
master: 769ms
0001: 751ms
0002: 703ms
0003: 701ms

long test;
master: 939ms
0001: 883ms
0002: 537ms
0003: 439ms

I think 0001/2 are mostly in committable shape.

With 0003, I'd want to make the portability check a bit nicer and more
centralized. I'm thinking of modifying the CRC check to report that
the host cpu/compiler understands SSE4.2 x86 intrinsics, and then the
compile time SSE2 check can piggyback on top of that without a runtime
check. This is conceptually easy but a bit of work to not look like a
hack (which probably means the ARM CRC check should look more generic
somehow...). The regression tests will likely need some work as well.

> Separately, it seems pretty awful efficiency / code density wise to have the
> NULL checks for ->strval all over. Might be worth forcing json_lex() and
> json_lex_string() to be inlined, with a constant parameter deciding whether
> ->strval is expected. That'd likely be enough to get the compiler specialize
> the code for us.

I had a look at this but it's a bit more invasive than I want to
devote time to at this point.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-0003-Use-vectorized-lookahead-in-json_lex_string-on-x8.patch text/x-patch 2.4 KB
v3-0002-Build-json-strings-in-larger-chunks-during-lexing.patch text/x-patch 1.5 KB
v3-0001-Simplify-json-lexing-state.patch text/x-patch 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-06 05:18:01 Re: [PATCH] Optimize json_lex_string by batching character copying
Previous Message Thomas Munro 2022-07-06 04:52:56 Re: AIX support - alignment issues