Re: COPY-able csv log outputs

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY-able csv log outputs
Date: 2007-05-28 18:25:48
Message-ID: Pine.GSO.4.64.0705281356260.6944@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Sun, 20 May 2007, Andrew Dunstan wrote:

>> I've had a preference for INSERT from the beginning here that this
>> reinforces.
> COPY is our standard bulk insert mechanism. I think arguing against it would
> be a very hard sell.

Let me say my final peace on this subject...if I considered this data to
be strictly bulk insert, then I'd completely agree here. Most of the
really interesting applications I was planning to build on top of this
mechanism are more interactive than that though. Here's a sample:

-Write a daemon that lives on the server, connects to a logging database,
and pops into an idle loop based on LISTEN.
-A client app wants to see the recent logs files. It uses NOTIFY to ask
the daemon for them and LISTENs for a response.
-The daemon wakes up, reads all the log files since it last did something,
and appends those records to the log file table. It sends out a NOTIFY to
say the log file table is current.

That enables remote clients to grab the log files from the server whenever
they please, so they can actually monitor themselves. Benchmarking is the
initial app I expect to call this, and with some types of tests I expect
the daemon to be importing every 10 minutes or so.

Assuming a unique index on the data to prevent duplication is a required
feature, I can build this using the COPY format logs as well, but that
requires I either a) am 100% perfect in making sure I never pass over the
same data twice, which is particularly annoying when the daemon gets
restarted, or b) break the COPY into single lines and insert them one at a
time, at which point I'm not bulk loading at all. If these were INSERT
statements instead, I'd have a lot more tolerance for error, because the
worst problem I'd ever run into is spewing some unique key violation
errors into the logs if I accidentally imported too much. With COPY, any
mistake or synchronization issue and I lose the whole import.

I don't mean to try and stir this back up again as an argument
(particularly not on this list). There are plenty of other apps where
COPY is clearly the best approach, you can easily make a case that my app
is a fringe application rather than a mainstream one, and on the balance
this job is still far easier than my current approach of parsing the logs.
I just wanted to give a sample of how using COPY impacts the dynamics of
how downstream applications will have to work with this data, so you can
see that my contrary preference isn't completely random here.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-05-28 18:57:59 Re: Seq scans status update
Previous Message Neil Conway 2007-05-28 17:54:47 Re: Interval input: usec, msec