Re: Proposal: More structured logging

From: Greg Stark <stark(at)mit(dot)edu>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: More structured logging
Date: 2022-01-31 22:46:29
Message-ID: CAM-w4HOG6JxdbkKJXjfGtAtrqizZYVdkrKTX_BKsHGkzQ+NYYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

1) I would like an interface which more or less guarantees that
*every* parameter of the log message is included in the structured
data. Ideally there should be no actual need to generate the formatted
messages for destinations like elastic search, just record the message
id and the parameters.

To that end I would thing something like:

ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) index, vacuumed_pages)));

Would end up like:

ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
errtag("relname", vacrel->relname),
errtag("dead_item_pointers", (long long) index),
errtag("vacuumed_pages", vacuumed_pages))));

And any untagged parameters could be included in the structured data
with unknown tag names. (Or perhaps some macro magic involving # but
I'm not sure that would be possible with va_args.)

2) I was kind of hoping we could have some sort of list of known tag
labels that could be used. So instead of an arbitrary string for the
label name you just referenced the known labels. Then we could
document that whenever "relid" appears as a tag id it's the oid of a
relation, for example.

This would help with things like "search elastic search for all log
entries for relid x, block p". Which wouldn't work if block is
sometimes "page" and sometimes "blocknum" and sometimes "block"....

So I was imagining something like

errtag(TAG_RELID, relid)

3) When it comes to things like queryid it would be helpful to be able
to inject tags from higher in the call stack so that invididual
ereport sites don't need to include it and we can decide to add new
things like it in the future. This will be useful for things like
distributed tracing tools who need to inject a span id into the every
log message.

I imagine it could also be useful more generally. Application
developers might want to inject things like a pipeline identifier from
their CI tools and then pull any logs with that identifier into their
CI reports.

Yes, they could be added as new columns in CSV and top level json
attributes but we may well have new tags that are not always there or
are even created on user request. Or we may have cases where there can
be more than one active at a time.

4) As far as code ... this is probably premature microoptimization but
I do find the tags all being individually palloced and all going
through string formatting seems like a lot of overhead. Like the
actual number of errtag() calls in the ereport is always going to be
the same and the values are probably always going to be a single
parameter, not really using the formatting functionality.

I don't necessarily have a better alternative though. The only thing I
can think of isn't really any better:

errtag_str(RELNAME, relname)
errtag_int(RELID, relid)

(possibly with some magic where there's an errtag() function that
looks up which data type to expect for a given tag id).

It looks strange to me that the errorTag struct has a "const char
*tagname" but a "char *tagvalue". I think this is a side effect of the
code and not actually a sensible way to define the struct. Surely they
should both be const?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-01-31 22:56:45 Re: GUC flags
Previous Message Justin Pryzby 2022-01-31 22:28:00 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats