Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 02:56:16
Message-ID: 7b97c5a40912291856k21b8a2x6da061aad0ea9089@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think there's clear support for a version of COPY that returns rows
> like a SELECT statement, particularly for use with CTEs.  There seems
> to be support both for a mode that returns text[] or something like it
> and also for a mode that returns a defined record type.  But that all
> seems separate from what you're proposing here, which is a
> considerably lower-level facility which seems like it would not be of
> much use to ordinary users, but might be of some value to tool
> implementors - or perhaps you'd disagree with that characterization?
>

This is in the other direction: freeing COPY from the restriction that
it can only put bytes into two places:

* A network socket (e.g. stdout)
* A file (as supseruser)

Instead, it can hand off bytes to an arbitrary UDF that can handle it
in any way. A clean design should be able to subsume at least the
existing simple behaviors, plus enabling more, as well as potentially
providing inspiration for how to decouple at least a few components of
COPY that perhaps can benefit the long-term cleanup effort there.

> Anyway, my specific reaction to your suggestions in the email that I
> quoted is that it seems a bit baroque and that I'm not really sure
> what it's useful for in practice.  I'm certainly not saying it ISN'T
> useful, because I can't believe that you would have gone to the
> trouble to work through all of this unless you had some ideas about
> nifty things that could be done with it, but I think maybe we need to
> back up and start by talking about the problems you're trying to
> solve, before we get too far down into a discussion of implementation
> details.

At Truviso this is used a piece of our replication solution. In the
patches submitted we see how enhancing dblink allows postgres to copy
directly from one node to another. Truviso uses it to directly write
bytes to a libpq connection (see the dblink patch) in the open COPY
state to achieve direct cross-node bulk loading for the purposes of
replication.

One could imagine a lot of ETL or data warehouse offloading
applications that can be enabled by allowing bytes to be handled by
arbitrary code, although this patch achieves nothing that writing some
middleware could not accomplish: it's just convenient to have and
likely more efficient than writing some application middleware to do
the same thing.

fdr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-30 03:08:33 Re: Thoughts on statistics for continuously advancing columns
Previous Message Josh Berkus 2009-12-30 02:52:27 Thoughts on statistics for continuously advancing columns