Re: How could we make it simple to access the log as a table?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Stephen Frost <sfrost(at)snowman(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How could we make it simple to access the log as a table?
Date: 2012-05-28 19:55:16
Message-ID: CA+TgmoYA-HNjPL0unzW95NjYp9J-Kxs=g6ocvZsfbLDgo-Uufg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 28, 2012 at 2:21 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
>> Yeah, I agree.  I think what is missing here is something that can be read (and maybe indexed?) like a table, but written by a pretty dumb process.  It's not terribly workable to have PG log to PG, because there are too many situations where the problem you're trying to report would frustrate your attempt to report it.  At the other end of the spectrum, our default log format is easy to generate but (a) impoverished, not even including a time stamp by default and (b) hard to parse, especially because two customers with the same log_line_prefix is a rare nicety.  The  CSV format is both rich and machine-parseable (good start!) but it takes an unreasonable amount of work to make it usefully queryable.  We need something that looks more like a big red button.
>
> There's a case to be made for some lossier "NoSQL-y" thing here.  But
> I'm not sure what size fits enough.  I hate the idea of requiring the
> deployment of *another* DBMS (however "lite"), but reading from text
> files isn't particularly nice either.
>
> Perhaps push the logs into an unlogged table on an extra PG instance,
> where an FDW tries to make that accessible?  A fair bit of process
> needs to live behind that "big red button," and that's at least a
> plausible answer.
>
> What's needed is to figure out what restrictions are acceptable to
> impose to have something that's "button-worthy."

I am not fired up about needing a second instance of PG; it seems to
me that that requirement by itself makes it considerably more involved
than pushing a big red button. I agree with you that deploying
another DBMS, even a lightweight one, is also not a good solution.

As far as CSV goes, I think the biggest deficiency is that there's a
mismatch between the way that log files are typically named (e.g. one
per day, or one per hour) and the way that a CSV foreign table is
created (you've got to point it at one particular file). Maybe we
could have a CSV reader that understands PostgreSQL-format CSV logs,
but you point it at a directory, rather than a single file, and it
reads all the CSV files in the directory. And maybe it could also be
smart enough that if you've got a WHERE clause that filter by date, it
uses that to skip any files that can be proven irrelevant. So the
user can just turn on CSV logging, point the FDW at the log directory,
and away they go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-05-28 19:56:29 Re: Per-Database Roles
Previous Message Peter Eisentraut 2012-05-28 19:50:00 Re: libpq URL syntax vs SQLAlchemy