| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com> | 
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> | 
| Subject: | Re: [PATCH] Optimize json_lex_string by batching character copying | 
| Date: | 2022-06-25 01:05:51 | 
| Message-ID: | 20220625010551.r5qrk5n2euuwqos3@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2022-06-24 17:18:10 -0700, Andres Freund wrote:
> On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote:
> > To test performance of this change I used COPY BINARY from a JSONB table
> > into another, containing fairly JSONB values of ~15kB.
> 
> This will have a lot of other costs included (DML is expensive). I'd suggest
> storing the json in a text column and casting it to json[b], with a filter
> ontop of the json[b] result that cheaply filters it away. That should end up
> spending nearly all the time somewhere around json parsing.
> 
> It's useful for things like this to include a way for others to use the same
> benchmark...
> 
> I tried your patch with:
> 
> DROP TABLE IF EXISTS json_as_text;
> CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1, 100);
> VACUUM FREEZE json_as_text;
> 
> SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me';
> 
> Which the patch improves from 846ms to 754ms (best of three). A bit smaller
> than your improvement, but still nice.
> 
> 
> I think your patch doesn't quite go far enough - we still end up looping for
> each character, have the added complication of needing to flush the
> "buffer". I'd be surprised if a "dedicated" loop to see until where the string
> last isn't faster.  That then obviously could be SIMDified.
A naive implementation (attached) of that gets me down to 706ms.
Greetings,
Andres Freund
| Attachment | Content-Type | Size | 
|---|---|---|
| json-lex-string-lookahead-speed.diff | text/x-diff | 706 bytes | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2022-06-25 01:09:27 | Re: Hardening PostgreSQL via (optional) ban on local file system access | 
| Previous Message | Justin Pryzby | 2022-06-25 00:36:16 | Re: Add non-blocking version of PQcancel |