Re: How could we make it simple to access the log as a table?

From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Christopher Browne" <cbbrowne(at)gmail(dot)com>
Cc: "Dimitri Fontaine" <dimitri(at)2ndquadrant(dot)fr>, "Stephen Frost" <sfrost(at)snowman(dot)net>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How could we make it simple to access the log as a table?
Date: 2012-05-30 13:57:02
Message-ID: C4DAC901169B624F933534A26ED7DF310861B586@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The CSV format is both rich and
> machine-parseable (good start!) but it takes an unreasonable amount of
> work to make it usefully queryable. We need something that looks more
> like a big red button.

Hello,

The Pg logs consist of a rich soup with many different information kind.
It's comfortable for reading to have all that stuff sequentially
available, but not as input for an ETL/reporting tool.

Moreover, there is a will to group many different sources in single
reporting tools, which may exclude Postgres as solution in many cases.

And everybody with more than 1 Postgres instance will like to have a
single point of access for all of them.

Having the logs available as Postgres table is nice, but it is only a
tiny part toward reporting
and not such complicated to implement with the current facilities,
except maybe for true time monitoring.

I think that more users would get satisfied with a utility tool that
transform the log content into a more etl friendly format.
Otherwise every reporting or monitoring project will have to implement
the "same" data parsing again and again.

Here a few loose ideas:

This transformation could take place on the fly (e.g.:
log_destination='pg_etl.sh'), or later on at some other place, using the
csv output as source.

this "pg_etl" tool could possibly generate more than one csv logs,
e.g.:

- query_durations
- checkpoint_logs

currently getting those kind of data requires string matching of the log
messages.
Alternatively, an additional "target" column in the log output would
make sense.

Some enhancements of the current format would also help.

examples:

- a "normed" format of the logged queries

(http://postgresql.1045698.n5.nabble.com/pg-stat-statements-with-query-t
ree-based-normalization-td4989745.html)

- add placeholders for the host, ip and port in log_filename.
e.g. log_filename ='pg-$HOST_$PORT-%Y%m%d_%H%M.log'

- posssibly in the log_line_prefix & csv content too.


- use a constant logname for the active log and add the timestamp only
when switching

e.g. pg-myhost_3332-20120530_1200.csv
pg-myhost_3332-20120530_1300.csv
pg-myhost_3332.csv <= current

best regards,

Marc Mamin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-05-30 14:25:03 Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)
Previous Message Waldecir Faria 2012-05-30 13:37:26 Re: Function call hierarchy/path since getting the buffer until access its data