Re: Binary support for pgoutput plugin

From: Dave Cramer <davecramer(at)gmail(dot)com>
To: Andres Freund <andres(dot)freund(at)enterprisedb(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary support for pgoutput plugin
Date: 2019-06-04 20:39:32
Message-ID: CADK3HHKF+TQRHoUYqvyq9s9Et-yzeSqP32C+v7bJKJDqF7AATg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dave Cramer

On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres(dot)freund(at)enterprisedb(dot)com>
wrote:

> Hi,
>
> On 2019-06-04 15:47:04 -0400, Dave Cramer wrote:
> > On Mon, 3 Jun 2019 at 20:54, David Fetter <david(at)fetter(dot)org> wrote:
> >
> > > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> > > > Is there a reason why pgoutput sends data in text format? Seems to
> > > > me that sending data in binary would provide a considerable
> > > > performance improvement.
> > >
> > > Are you seeing something that suggests that the text output is taking
> > > a lot of time or other resources?
> > >
> > > Actually it's on the other end that there is improvement. Parsing text
> > takes much longer for almost everything except ironically text.
>
> It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea
> are all much more expensive to convert from binary to text.
>
>
> > To be more transparent there is some desire to use pgoutput for something
> > other than logical replication. Change Data Capture clients such as
> > Debezium have a requirement for a stable plugin which is shipped with
> core
> > as this is always available in cloud providers offerings. There's no
> reason
> > that I am aware of that they cannot use pgoutput for this.
>
> Except that that's not pgoutput's purpose, and we shouldn't make it
> meaningfully more complicated or slower to achieve this. Don't think
> there's a conflict in this case though.
>

agreed, my intent was to slightly bend it to my will :)

>
>
> > There's also no reason that I am aware that binary outputs can't be
> > supported.
>
> Well, it *does* increase version dependencies, and does make replication
> more complicated, because type oids etc cannot be relied to be the same
> on source and target side.
>
> I was about to agree with this but if the type oids change from source to
target you
still can't decode the text version properly. Unless I mis-understand
something here ?

>
>
> > The protocol would have to change slightly and I am working
> > on a POC patch.
>
> Hm, what would have to be changed protocol wise? IIRC that'd just be a
> different datum type? Or is that what you mean?
> pq_sendbyte(out, 't'); /* 'text' data follows */
>
> I haven't really thought this through completely but one place JDBC has
problems with binary is with
timestamps with timezone as we don't know which timezone to use. Is it safe
to assume everything is in UTC
since the server stores in UTC ? Then there are UDF's. My original thought
was to use options to send in the
types that I wanted in binary, everything else could be sent as text.

IIRC there was code for the binary protocol in a predecessor of
> pgoutput.
>

Hmmm that might be good place to start. I will do some digging through git
history

>
> I think if we were to add binary output - and I think we should - we
> ought to only accept a patch if it's also used in core.
>

Certainly; as not doing so would make my work completely irrelevant for my
purpose.

Thanks,

Dave

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-06-04 20:46:41 Re: Binary support for pgoutput plugin
Previous Message Andres Freund 2019-06-04 20:38:35 Re: Binary support for pgoutput plugin