Skip site navigation (1) Skip section navigation (2)

Re: COPY-able csv log outputs

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY-able csv log outputs
Date: 2007-05-21 01:24:09
Message-ID: Pine.GSO.4.64.0705202041470.22672@westnet.com (view raw or flat)
Thread:
Lists: pgsql-patches
I got a chance to review this patch over the weekend.  Basic API seems 
good, met all my requirements, no surprises with how the GUC variable 
controlled the feature.

The most fundamental issue I have with the interface is that using COPY 
makes it difficult to put any unique index on the resulting table.  I like 
to have a unique index on my imported log table because it rejects the 
dupe records if you accidentally import the same section of log file 
twice.  COPY tosses the whole thing if there's an index violation, which 
is a problem during a regular import because you will occasionally come 
across lines with the same timestamp that are similar in every way except 
for their statment; putting an index on the timestamp+statement seems 
impractical.

I've had a preference for INSERT from the beginning here that this 
reinforces.  I'm planning to just work around this issue by doing the COPY 
into a temporary table and then INSERTing from there.  I didn't want to 
just let the concern pass by without mentioning it though.  It crosses my 
mind that inserting some sort of unique log file line ID number would 
prevent the dupe issue and make for better ordering (it's possible to have 
two lines with the same timestamp show up in the wrong order now), not 
sure that's a practical idea to consider.

The basic coding of the patch seemed OK to me, but someone who is much 
more familiar than myself with the mechanics of pipes should take a look 
at that part of the patch before committing; it's complicated code and I 
can't comment on it.  There are some small formatting issues that need to 
be fixed, particularly in the host+port mapping.  I can fix those myself 
and submit a slightly updated patch.  There's some documentation 
improvements I want to make before this goes in as well.

The patch is actually broken fairly hard right now because of the switch 
from INSERT to COPY FROM CSV as the output format at the last minute.  It 
outputs missing fields as NULL (fine for INSERT) that chokes the CSV 
import when the session_start timestamp is missing.  All of those NULL 
values need to be just replaced with nothing for proper CSV syntax; there 
should just the comma for the next field.  I worked around this with

copy pglog from '/opt/pgsql/testlog.csv' with CSV null as 'NULL';

I can fix that too when I'm revising.  I plan to have a version free of 
obvious bugs to re-submit ready by next weekend.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

pgsql-patches by date

Next:From: Andrew DunstanDate: 2007-05-21 03:22:53
Subject: Re: COPY-able csv log outputs
Previous:From: FAST PostgreSQLDate: 2007-05-21 01:22:21
Subject: Re: Updateable cursors patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group