WIP Patch: Add a function that returns binary JSONB as a bytea

From: Kevin Van <kevinvan(at)shift(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: WIP Patch: Add a function that returns binary JSONB as a bytea
Date: 2018-10-31 05:02:06
Message-ID: CAJArn6QFs2EZsX0NyTkBPpdmPc28o-8ze_ZvjwBLGNv+WmrgHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This patch adds a new function that allows callers to receive binary jsonb.
This change was proposed in the discussion here:
https://www.postgresql.org/message-id/CAOsiKEL7%2BKkV0C_hAJWxqwTg%2BPYVfiGPQ0yjFww7ECtqwBjb%2BQ%40mail.gmail.com
and the primary motivation is to reduce database load by skipping jsonb to
string conversion (with the expectation that the application can parse
binary jsonb).

This patch is still a WIP and I am looking for any guidance on the
approach. If it is valid, I'd also appreciate any guidance on what kind of
test coverage would be appropriate for a change like this.

Note: I prepended the version number "1" to the byte array -- this is
similar to what jsonb_recv does in the same file.

I have been testing on a x86-64 processor running MacOSX and do not know
what the behavior is on other platforms.

This patch is off of master and compiles successfully. The following is
some example output using psql:

database=# SELECT jsonb_raw_bytes('{"b":12345}'::jsonb);

jsonb_raw_bytes

----------------------------------------------------------

\x0101000020010000800d0000106200000028000000018001002909

(1 row)

database=# SELECT jsonb_raw_bytes('{"a":{"b":12345}}'::jsonb->'a');

jsonb_raw_bytes

----------------------------------------------------------

\x0101000020010000800d0000106200000028000000018001002909

(1 row)

Some preliminary testing on my own machine shows me that this change has a
significant impact on performance.

I used psql to select a jsonb column from all the rows in a table (about 1
million rows) where the json data was roughly 400-500 bytes per record.

database=# \timing

Timing is on.

database=# \o /tmp/raw_bytes_out.txt;

database=# SELECT jsonb_raw_bytes(data) FROM datatable;

Time: 2582.545 ms (00:02.583)

database=# \o /tmp/json_out.txt;

database=# SELECT data FROM datatable;

Time: 5653.235 ms (00:05.653)

Of note is that the size of raw_bytes_out.txt in the example is roughly
twice that of json_out.txt so the timing difference is not due to less data
being transferred.

Attachment Content-Type Size
jsonb_raw_bytes_v1.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-10-31 06:30:45 Re: ToDo: show size of partitioned table
Previous Message Amit Langote 2018-10-31 04:38:29 Re: FDW Parallel Append