Re: [PATCH] Optimize json_lex_string by batching character copying

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying
Date: 2022-08-15 13:33:21
Message-ID: CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote

> On Mon, Jul 11, 2022 at 11:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > I wonder if we can add a somewhat more general function for scanning until
> > some characters are found using SIMD? There's plenty other places that could
> > be useful.
>
> In simple cases, we could possibly abstract the entire loop. With this particular case, I imagine the most approachable way to write the loop would be a bit more low-level:
>
> while (p < end - VECTOR_WIDTH &&
> !vector_has_byte(p, '\\') &&
> !vector_has_byte(p, '"') &&
> vector_min_byte(p, 0x20))
> p += VECTOR_WIDTH
>
> I wonder if we'd lose a bit of efficiency here by not accumulating set bits from the three conditions, but it's worth trying.

The attached implements the above, more or less, using new pg_lfind8()
and pg_lfind8_le(), which in turn are based on helper functions that
act on a single vector. The pg_lfind* functions have regression tests,
but I haven't done the same for json yet. I went the extra step to use
bit-twiddling for non-SSE builds using uint64 as a "vector", which
still gives a pretty good boost (test below, min of 3):

master:
356ms

v5:
259ms

v5 disable SSE:
288ms

It still needs a bit of polishing and testing, but I think it's a good
workout for abstracting SIMD out of the way.

-------------
test:

DROP TABLE IF EXISTS long_json_as_text;
CREATE TABLE long_json_as_text AS
with long as (
select repeat(description, 11)
from pg_description
)
select (select json_agg(row_to_json(long))::text as t from long) from
generate_series(1, 100);
VACUUM FREEZE long_json_as_text;

select 1 from long_json_as_text where t::json is null; -- from Andrew upthread

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v5-json-lex-string-simd-ops.patch text/x-patch 11.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-08-15 13:39:27 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Damir Belyalov 2022-08-15 13:23:26 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)