Re: Tee for COPY

From: David Fetter <david(at)fetter(dot)org>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tee for COPY
Date: 2015-12-13 13:43:24
Message-ID: 20151213134324.GB28490@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 13, 2015 at 11:29:23AM +0300, Konstantin Knizhnik wrote:
> Hi,
>
> I am trying to create version of COPY command which can scatter/replicate data to different nodes based on some distribution method.
> There is some master process, having information about data distribution, to which all clients are connected.
> This master process should receive copied data from client and scatters tuples to nodes.
> May be somebody can recommend me the best way of implementing such COPY agent?
>
> The obvious plan is the following:
>
> 1. Register utility callback
> 2. Handle T_CopyStmt in this callback
> 3. Use BeginCopyFrom/NextCopyFrom to receive tuples from client
> 4. Calculate distribution function for the received tuple
> 5. Establish connection with correspondent node (if not yet established) and start the same COPY command to this node (if not started yet).
> 6. Send data to this node using PQputCopyData.
>
> The problem is with step 6: I do not see any way to copy received data to the destination node.
> NextCopyFrom returns array of values (Dutums) of tuple columns. But there are no public methods to send tuple to the copy stream.
> All this logic is implemented in src/backend/commands/copy.c and is not available outside this module.
>
> It is more or less clear how to do it using text or CSV mode: I can use NextCopyFromRawFields and then construct a line with comma separated list of values.
> But how to handle binary mode? Also, I suspect that copy in text mode is significantly slower than in binary mode, isn't it?
>
> The dirty solution is just to cut&paste copy.c code. But may be there is some more elegant way?

A slightly cleaner solution is to make public methods to send tuples
to the copy stream and have COPY call those.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

  • Tee for COPY at 2015-12-13 08:29:23 from Konstantin Knizhnik

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-12-13 14:05:11 Re: Move PinBuffer and UnpinBuffer to atomics
Previous Message David Fetter 2015-12-13 13:37:37 Re: Logical replication and multimaster