Re: JSON decoding plugin

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON decoding plugin
Date: 2013-12-09 16:12:22
Message-ID: CAHyXU0zE-kcofYtnoQqu7i2odH+L8zhfQz9-Ozbdga4NUXgBJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
> Hi,
>
> A few months ago, it was proposed [1] that would be interested to have a
> json output plugin for logical decoding. Here it is.
>
> Each transaction is a JSON object that can contain xid (optional),
> timestamp (optional), and change array. Each change's element is a
> command that was decoded and it can contains: kind (I/U/D), schema
> (optional), table, columnnames, columntypes (optional), columnvalues,
> and oldkeys (only for U/D). columnnames, columntypes and columnvalues
> are arrays. oldkeys is an object that contains the following arrays:
> keynames, keytypes (optional), and keyvalues.
>
> The JSON objects are serialized if you are decoding a serie of
> transactions. Here is an output example:
>
> {
> "xid": 702,
> "change": [
> {
> "kind": "insert",
> "schema": "public",
> "table": "foo",
> "columnnames": ["a", "b", "c"],
> "columntypes": ["int4", "int4", "text"],
> "columnvalues": [1, 2, "test"]
> }
> ,{
> "kind": "update",
> "schema": "public",
> "table": "foo",
> "columnnames": ["a", "b", "c"],
> "columntypes": ["int4", "int4", "text"],
> "columnvalues": [1, 2, "test2"],
> "oldkeys": {
> "keynames": ["a", "b"],
> "keytypes": ["int4", "int4"],
> "keyvalues": [1, 2]
> }
> }
> ]
> }
> {
> "xid": 703,
> "change": [
> {
> "kind": "update",
> "schema": "public",
> "table": "foo",
> "columnnames": ["a", "b", "c"],
> "columntypes": ["int4", "int4", "text"],
> "columnvalues": [1, 3, "test2"],
> "oldkeys": {
> "keynames": ["a", "b"],
> "keytypes": ["int4", "int4"],
> "keyvalues": [1, 2]
> }
> }
> ]
> }
> {
> "xid": 704,
> "change": [
> {
> "kind": "delete",
> "schema": "public",
> "table": "foo",
> "oldkeys": {
> "keynames": ["a", "b"],
> "keytypes": ["int4", "int4"],
> "keyvalues": [1, 3]
> }
> }
> ]
> }
>
>
> Some data types was adapted to conform with JSON spec. NAN and Infinity
> are not valid JSON symbols so their representation is NULL (as some JSON
> implementations). Due to JSON datatype simplicity, I represent the vast
> majority of Postgres datatypes as string (However, I admit that we could
> mimic the json datatype conversion rules).
>
> The oldkeys treatment follows what was defined by the commit [2]. It uses:
>
> (i) primary key (default behavior);
> (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
> (iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
> (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
> table).
>
> The TOAST columns have a special treatment for UPDATEs. If a tuple that
> contains a TOAST field is updated, the TOAST field is included iif it is
> changed too. It means that unchanged TOAST field are omitted from
> columns* arrays. This means less overhead while transmitting,
> processing and applying changes.
>
> By design, (i) output plugin doesn't know about aborted transactions and
> (ii) subtransactions are reordered into a toplevel transaction and only
> the committed pieces are passed to the plugin.
>
> You can test it firing the regression tests (e.g. 'make test') or using
> the following steps?
>
> postgresql.conf:
> wal_level = logical
> max_wal_senders = 2
> max_logical_slots = 2
>
> start collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout
> --plugin=json_decoding_plugin --init
>
> [execute some transactions]
>
> start printing decoded transactions:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start
>
> stop collecting WAL records:
>
> $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop
>
>
> Comments?

This is pretty neat. Couple minor questions:
*) Aren't you *en*coding data into json, not the other way around (decoding?)
*) Consider generating a long bytea instead of explicitly writing a
32kb sql into the patch.
*) You've built your own json serializer here. Maybe some code can be
shared with the json type?
*) Consider removing 'plugin ' from the name of the plugin.
--plugin=json_decoding etc.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-09 16:21:59 Re: pg_archivecleanup bug
Previous Message Serge Negodyuck 2013-12-09 15:49:34 Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby