From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Greg Smith <gsmith(at)gregsmith(dot)com> |
Cc: | FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org |
Subject: | Re: COPY-able csv log outputs |
Date: | 2007-05-21 03:22:53 |
Message-ID: | 4651108D.9070806@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Greg Smith wrote:
> I got a chance to review this patch over the weekend. Basic API seems
> good, met all my requirements, no surprises with how the GUC variable
> controlled the feature.
>
> The most fundamental issue I have with the interface is that using
> COPY makes it difficult to put any unique index on the resulting
> table. I like to have a unique index on my imported log table because
> it rejects the dupe records if you accidentally import the same
> section of log file twice. COPY tosses the whole thing if there's an
> index violation, which is a problem during a regular import because
> you will occasionally come across lines with the same timestamp that
> are similar in every way except for their statment; putting an index
> on the timestamp+statement seems impractical.
Does the format not include the per-process line number? (I know i
briefly looked at this patch previously, but I forget the details.) One
reason I originally included line numbers in log_line-prefix was to
handle this sort of problem.
>
> I've had a preference for INSERT from the beginning here that this
> reinforces.
COPY is our standard bulk insert mechanism. I think arguing against it
would be a very hard sell.
> I'm planning to just work around this issue by doing the COPY into a
> temporary table and then INSERTing from there. I didn't want to just
> let the concern pass by without mentioning it though. It crosses my
> mind that inserting some sort of unique log file line ID number would
> prevent the dupe issue and make for better ordering (it's possible to
> have two lines with the same timestamp show up in the wrong order
> now), not sure that's a practical idea to consider.
I guess that answers my question. We should definitely provide a unique
line key.
>
> The basic coding of the patch seemed OK to me, but someone who is much
> more familiar than myself with the mechanics of pipes should take a
> look at that part of the patch before committing; it's complicated
> code and I can't comment on it. There are some small formatting
> issues that need to be fixed, particularly in the host+port mapping.
> I can fix those myself and submit a slightly updated patch. There's
> some documentation improvements I want to make before this goes in as
> well.
>
> The patch is actually broken fairly hard right now because of the
> switch from INSERT to COPY FROM CSV as the output format at the last
> minute. It outputs missing fields as NULL (fine for INSERT) that
> chokes the CSV import when the session_start timestamp is missing.
> All of those NULL values need to be just replaced with nothing for
> proper CSV syntax; there should just the comma for the next field. I
> worked around this with
>
> copy pglog from '/opt/pgsql/testlog.csv' with CSV null as 'NULL';
>
>
I missed that before - yes it should be fixed.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-21 03:47:50 | Re: COPY-able csv log outputs |
Previous Message | Greg Smith | 2007-05-21 01:24:09 | Re: COPY-able csv log outputs |