Re: logical changeset generation v6.2

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.2
Date: 2013-10-15 14:48:48
Message-ID: 20131015144848.GC8001@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-10-15 10:15:14 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
> >> If we go for CSV I think we should put the entire primary key as one
> >> column (containing all the columns) and the entire row another.
> >
> > What about columns like:
> > * action B|I|U|D|C
>
> BEGIN and COMMIT?

That's B and C, yes. You'd rather not have them? When would you replay
the commit without an explicit message telling you to?

> Repeating the column names for every row strikes me as a nonstarter.
> [...]
> Sure, some people may want JSON or XML
> output that reiterates the labels every time, but for a lot of people
> that's going to greatly increase the size of the output and be
> undesirable for that reason.

But I argue that most simpler users - which are exactly the ones a
generic output plugin is aimed at - will want all column names since it
makes replay far easier.

> If the plugin interface isn't rich enough to provide a convenient way
> to avoid that, then it needs to be fixed so that it is, because it
> will be a common requirement.

Oh, it surely is possibly to avoid repeating it. The output plugin
interface simply gives you a relcache entry, that contains everything
necessary.
The output plugin would need to keep track of whether it has output data
for a specific relation and it would need to check whether the table
definition has changed, but I don't see how we could avoid that?

> > What still need to be determined is:
> > * how do we separate and escape multiple values in one CSV column
> > * how do we represent NULLs
>
> I consider the escaping a key design decision. Ideally, it should be
> something that's easy to reverse from a scripting language; ideally
> also, it should be something similar to how we handle COPY. These
> goals may be in conflict; we'll have to pick something.

Note that parsing COPYs is a major PITA from most languages...

Perhaps we should make the default output json instead? With every
action terminated by a nullbyte?
That's probably easier to parse from various scripting languages than
anything else.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-15 14:56:53 Re: logical changeset generation v6.2
Previous Message Vik Fearing 2013-10-15 14:42:49 Re: Doc Patch: Subquery section to say that subqueries can't modify data