Re: WIP Patch: Add a function that returns binary JSONB as a bytea

From: Christian Ohler <ohler(at)shift(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: sfrost(at)snowman(dot)net, Kevin Van <kevinvan(at)shift(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch: Add a function that returns binary JSONB as a bytea
Date: 2018-11-02 02:48:59
Message-ID: CAOsiKEKAJn0TEVg=_nCOw=vdCDRyMrJGBrPKs4uZuerczj2KfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 31, 2018 at 7:22 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> If we're going to expose the
> internal format, let's just change the definition of the type's binary
> I/O format, thereby getting a win for purposes like COPY BINARY as well.
> We'd need to ensure that jsonb_recv could tell whether it was seeing the
> old or new format, but at worst that'd require prepending a header of
> some sort. (In practice, I suspect we'd end up with a wire-format
> definition that isn't exactly the bits-on-disk, but something easily
> convertible to/from that and more easily verifiable by jsonb_recv.
> Numeric subfields, for instance, ought to match the numeric wire
> format, which IIRC isn't exactly the bits-on-disk either.)
>

How would this work from the driver's and application's perspective? What
does the driver do when receiving JSONB data?

Applications currently receive JSON strings when reading JSONB data, and
the driver presumably has to stay compatible with that. Does that mean the
driver transparently converts JSONB to JSON before handing it to the
application? That scales better than doing it in Postgres itself, but is
still the kind of inefficiency we're trying to avoid.

We want to convert JSONB directly to language-native objects. This
shouldn't be the responsibility of the Postgres driver, since the
conversion is complex and can be done in different ways, such as
instantiating objects from a class hierarchy vs instantiating generic
containers, or eager vs lazy conversion. Applications that are sensitive
to JSONB performance likely want control over these aspects. Postgres
drivers aren't coupled to specific JSON parsers; they shouldn't be coupled
to specific JSONB parsers either.

So, AFAICS, when the application requests JSONB data, the driver has to
hand it the raw JSONB bytes. But that's incompatible with what currently
happens. To preserve compatibility, does the application have to opt in by
setting some out-of-band per-query per-result-column flags to tell the
driver how it wants the JSONB data returned? That's workable in principle
but bloats every driver's API with some rarely-used performance feature.
Seems much simpler to put this into the query.

The idea behind the proposal is to improve efficiency by avoiding
conversions, and the most straightforward way to do that is for every layer
to pass through the raw bytes. With an explicit conversion to BYTEA in the
query, this is automatic without any changes to drivers, since every layer
already knows to leave BYTEAs untouched.

I don't have an argument against _also_ adding a binary format version 2
for JSONB once we define a portable JSONB format; but I am not sure it
alone solves the problem we have.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2018-11-02 02:57:25 Re: COPY FROM WHEN condition
Previous Message Amit Langote 2018-11-02 02:46:59 Re: partitioned indexes and tablespaces