Re: CSV Logging questions

From: David Fetter <david(at)fetter(dot)org>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CSV Logging questions
Date: 2017-09-04 16:31:57
Message-ID: 20170904163157.GI17009@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 04, 2017 at 05:27:40PM +0100, Greg Stark wrote:
> I was just looking over the CSV logging code and have a few questions
> about why things were done the way they were done.
>
> 1) Why do we gather a per-session log line number? Is it just to aid
> people importing to avoid duplicate entries from partial files? Is
> there some other purpose given that entries will already be sequential
> in the csv file?
>
> 2) Why is the file error conditional on log_error_verbosity? Surely
> the whole point of a structured log is that you can log everything and
> choose what to display later -- i.e. why csv logging doesn't look at
> log_line_prefix to determine which other bits to display. There's no
> added cost to include this information unconditionally and they're far
> from the largest piece of data being logged either.
>
> 3) Similarly I wonder if the statement should always be included even
> with hide_stmt is set so that users can write sensible queries against
> the data even if it means duplicating data.
>
> 4) Why the session start time? Is this just so that <process_id,
> session_start_time> uniquely identiifes a session? Should we perhaps
> generate a unique session identifier instead?
>
> The real reason I'm looking at this is because I'm looking at the
> json_log plugin from Michael Paquier. It doesn't have the log line
> numbers and I can't figure whether this is something it should have
> because I can't quite figure out why they exist in CSV files. I think
> there are a few other fields that have been added in Postgres but are
> missing from the JSON log because of version skew.
>
> I'm wondering if we should abstract out the CSV format so instead of
> using emit_log_hook you would add a new format and it would specify a
> "add_log_attribute(key,val)" hook which would get called once per log
> format so you could have as many log formats as you want and be sure
> they would all have the same data. That would also mean that the
> timestamps would be in sync and we could probably eliminate the
> occurrences of the wrong format appearing in the wrong logs.

+1 for making the emitters all work off the same source.

Any idea how much work we're talking about to do these things?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-04 16:56:28 Re: Variable substitution in psql backtick expansion
Previous Message Tom Lane 2017-09-04 16:31:19 Re: Variable substitution in psql backtick expansion