CSV Logging questions

From: Greg Stark <stark(at)mit(dot)edu>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: CSV Logging questions
Date: 2017-09-04 16:27:40
Message-ID: CAM-w4HNEAL1JjOM25gMvT38dEmmsEd3tRwhDFACAiH+xd6m5Ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was just looking over the CSV logging code and have a few questions
about why things were done the way they were done.

1) Why do we gather a per-session log line number? Is it just to aid
people importing to avoid duplicate entries from partial files? Is
there some other purpose given that entries will already be sequential
in the csv file?

2) Why is the file error conditional on log_error_verbosity? Surely
the whole point of a structured log is that you can log everything and
choose what to display later -- i.e. why csv logging doesn't look at
log_line_prefix to determine which other bits to display. There's no
added cost to include this information unconditionally and they're far
from the largest piece of data being logged either.

3) Similarly I wonder if the statement should always be included even
with hide_stmt is set so that users can write sensible queries against
the data even if it means duplicating data.

4) Why the session start time? Is this just so that <process_id,
session_start_time> uniquely identiifes a session? Should we perhaps
generate a unique session identifier instead?

The real reason I'm looking at this is because I'm looking at the
json_log plugin from Michael Paquier. It doesn't have the log line
numbers and I can't figure whether this is something it should have
because I can't quite figure out why they exist in CSV files. I think
there are a few other fields that have been added in Postgres but are
missing from the JSON log because of version skew.

I'm wondering if we should abstract out the CSV format so instead of
using emit_log_hook you would add a new format and it would specify a
"add_log_attribute(key,val)" hook which would get called once per log
format so you could have as many log formats as you want and be sure
they would all have the same data. That would also mean that the
timestamps would be in sync and we could probably eliminate the
occurrences of the wrong format appearing in the wrong logs.

--
greg

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-04 16:31:19 Re: Variable substitution in psql backtick expansion
Previous Message Alexander Kuzmenkov 2017-09-04 16:17:42 Re: index-only count(*) for indexes supporting bitmap scans