Re: Patch to implement pg_current_logfile() function

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Gilles Darold <gilles(dot)darold(at)dalibo(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to implement pg_current_logfile() function
Date: 2016-12-08 21:34:58
Message-ID: 20161208153458.7b07561b@slate.meme.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Thu, 8 Dec 2016 11:27:34 -0500
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Dec 7, 2016 at 11:02 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> > I read this and knew that I hadn't finished review, but didn't
> > immediately respond because I thought the patch had to be
> > marked "ready for committer" on commitfest.postgresql.org
> > before the committers would spend a lot of time on it.
>
> Generally that's true, but I was trying to be helpful and trying also
> to move this toward some kind of conclusion.

It has been very helpful, particularly your last reply. And I think
there's now a clear path forward.

(I'm not looking for any replies to the below, although of course would
welcome whatever you've got to say.)

> It is, of course, not my job to decide who is at fault in whatever may
> or may not have gone wrong during the reviewing process.

Understood. And I'm not looking for any sort of judgment or
attempting to pass judgment. It has been frustrating though
and your email provided an opportunity to vent, and to
provide some feedback, of whatever quality, to the review
process.

It's been that much more frustrating because I don't really
care one way or another about the feature. I was just trying
to build up credit reviewing somebody else's patch, and instead
probably did the opposite with all the thrashing. :)

> I do think that the blizzard of small patches that you've
> submitted in some of your emails may possibly be a bit overwhelming.
> We shouldn't really need a stack of a dozen or more patches to
> implement one small feature. Declarative partitioning only had 7.
> Why does this need more than twice that number?

I'm trying to break up changes into patches which are as small
as possible to make them more easily understood by providing
a guide for the reader/reviewer. So rather than
a single patch I'd make, say, 3. One for documentation to describe
the change. Another which does whatever refactoring is necessary
to the existing code, but which otherwise does not introduce any
functional changes. And another which adds the new code which makes
use of the refactoring. At each stage the code should continue
to work without bugs. The other party can then decide to incorporate
the patchs into the main patch, which is itself another attached
patch.

Do this for several unrelated changes to the main patch and the patches
add up.

Mostly I wouldn't expect various patches to be reviewed by the
committers, they'd get incorporated into the main patch.

This is how I break down the work when I look at code changes.
What's it do? What does it change/break in the existing code?
How does the new stuff work? But this process has not worked
here so I guess I'll stop.

But.... I expect you will see at least 3 patches submitted for
committer review. I see a number of hardcoded constants,
now that the main patch adds additional code, that can
be made into symbols to, IMO, improve code clarity.
Guiles points out that this is an issue of coding style
and might be considered unnecessary complication.
So they are not in the main patch. They are attached
(applied to v14 of the main patch; really, the first applies
to the master PG branch and the 2nd to v14 of the
pg_current_logfiles patch) if you want to look
at them now and provide some guidance as to whether they
should be dropped or included in the patch.

> > The extreme case is the attached "cleanup_rotate" patch.
> > (Which applies to v14 of this patch.) It's nothing but
> > a little bit of tiding up of the master branch,

> I took a look at that patch just now and I guess I don't really see
> the point.

The point would be to make the code easier to read. Saving cycles
is hardly ever worthwhile in my opinion. The whole point
of code is to be read by people and be understandable.
If it's not understood it's useless going forward.

Once I've gone to the effort to understand something
written, that I'm going to be responsible for maintaining, I like to
see it written as clearly as possible. In my experience
if you don't do this then little confusions multiply and
eventually the whole thing turns to mud.

The trade-off, as you say, is the overhead involved
in running minor changes through the production
process. I figure this can be minimized by bundling
such changes with related substantial changes.

Again, it's not working so I'll drop it.

> > FYI. The point of the "retry_current_logfiles"
> > patch series is to handle the case
> > where logfile_writename gets a ENFILE or EMFILE.
> > When this happens the current_logfiles file can
> > "skip" and not contain the name(s) of the current
> > log file for an entire log rotation. To miss, say,
> > a month of logs because the logs only rotate monthly
> > and your log processing is based on reading the
> > current_logfiles file sounds like a problem.
>
> I think if you are trying to enumerate the names of your logfiles by
> any sort of polling mechanism, rather than by seeing what files show
> up in your logging directory, you are doing it wrong. So, I don't see
> that this matters at all.

Ok. This simplifies things substantially.

It is worth noting that the PG pg_current_logfile() function
is dependent upon the content of the current_logfiles file. So
if the file is wrong the function will also give wrong answers.
But I don't care if you don't.

The logging code is quite anal about dropping information.
It's very helpful to know that (so long as limitations are
documented I presume) the related meta-information functionality
of the current_logfiles file and the pg_current_logfile() function
need only be best-effort.

> > The point of the code is to report not just the real error, but what
> > effect the real error has

> Generally speaking, I would say that it's the user's job to decide
> what the impact of an error is; our job is only to tell them what
> happened. There are occasional exceptions; for example:
> >
> > rhaas=# select ablance from pgbench_accounts;
> > ERROR: column "ablance" does not exist
> > LINE 1: select ablance from pgbench_accounts;
> > ^
> > HINT: Perhaps you meant to reference the column
> > "pgbench_accounts.abalance".
> >
> > The HINT -- which you'll note is part of the same ereport() rather
> > than being separately logged

Yeah. :)

> -- has been judged a helpful response
> > to a particularly common kind of mistake. But in general it's not
> > necessary to elaborate on possible reasons for the error; it
> > suffices to report it. One of the important reasons for this is
> > that our guesses about what probably caused the problem can easily
> > turn out to be WRONG, and a misleading HINT is worse than no hint
> > because it confuses some users and annoys others.

Ok. And agreed. And I'll let the error reporting in the patch
which started this thread drop.

I'm not attempting to argue here; but to provide some relief
for what comes into my brain.

Which is: I do think there is
a distinction between trying to guess what problem the user
needs to fix and reporting what part of PG is failing. In
other words just passing through the error gotten from the
OS regards some file or another does not give the user
who knows nothing about PG internals useful information.
There does need to be some interpretation of what the impact
is to the user.

A related thought is this. There are 3 abstraction levels of
concern. The level beneath PG, the PG level, and the level
of the user's application. When PG detects an error from either
"above" or "below" its job is to describe the problem from
its own perspective. HINTs are attempts to cross the boundary
into the application's perspective.

Thanks for the reply.

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
patch_pg_current_logfile-v14.diff.abstract_guc_part1 application/octet-stream 2.6 KB
patch_pg_current_logfile-v14.diff.abstract_guc_part2 application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-12-08 21:40:29 Re: pg_dump vs. TRANSFORMs
Previous Message Andres Freund 2016-12-08 21:34:41 Time to drop old-style (V0) functions?