Re: RFC: Logging plan of the running query

From: James Coleman <jtc331(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Greg Stark <stark(at)mit(dot)edu>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, david(dot)christensen(at)crunchydata(dot)com, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: RFC: Logging plan of the running query
Date: 2023-06-13 15:53:39
Message-ID: CAAaqYe-5jKTuuZB5StBwa2YRiqpnN=q9buKADP8A_RPd09f_Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 13, 2023 at 11:22 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2023-06-13 00:52, James Coleman wrote:
> >>
> >> > I've attached v27. The important change here in 0001 is that it
> >> > guarantees the interrupt handler is re-entrant, since that was a bug
> >> > exposed by my testing. I've also included 0002 which is only meant for
> >> > testing (it attempts to log in the plan in every
> >> > CHECK_FOR_INTERRUPTS() call).
> >>
> >> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
> >> ProcessLogQueryPlanInterruptActive can remain true.
> >> After that, pg_log_query_plan() does nothing but just returns.
> >>
> >> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
> >> pg_log_query_plan() is running, but it doesn't cause errors or
> >> critical
> >> problem.
> >>
> >> Considering the problem solved and introduced by v27, I think v26
> >> might
> >> be fine.
> >> How do you think?
> >
> > The testing I did with calling this during every CFI is what uncovered
> > the re-entrancy problem. IIRC (without running that test again) the
> > problem was a stack overflow. Now, to be sure this is a particularly
> > degenerate case because in real-world usage it'd be impossible in
> > practice, I think, to trigger that many calls to this function (and by
> > extension the interrupt handler).
>
> Yeah.In addition, currently only superusers are allowed to execute
> pg_log_query_plan(), I think we don't need to think about cases
> where users are malicious.
>
> > If SIGINT is the only concern we could reset
> > ProcessLogQueryPlanInterruptActive in error handling code. I admit
> > that part of my thought process here is thinking ahead to an
> > additional patch I'd like to see on top of this, which is logging a
> > query plan before cleaning up when statement timeout occurs.
>
> I remember this is what you wanted do.[1]
>
> > The
> > re-entrancy issue becomes more interesting then, I think, since we
> > would then have automated calling of the logging code. BTW: I'd
> > thought that would make a nice follow-up patch for this, but if you'd
> > prefer I could add it as another patch in the series here.
> >
> > What do you think about resetting the flag versus just not having it?
>
> If I understand you correctly, adding the flag is not necessary for this
> proposal.
> To keep the patch simple, I prefer not having it.
>

I'm going to re-run tests with my patch version + resetting the flag
on SIGINT (and any other error condition) to be certain that the issue
you uncovered (where backends get stuck after a SIGINT not responding
to the requested plan logging) wasn't masking any other issues.

As long as that run is clean also then I believe the patch is safe
as-is even without the re-entrancy guard.

I'll report back with the results of that testing.

Regards,
James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-06-13 16:20:42 Support TZ format code in to_timestamp()
Previous Message Tristan Partin 2023-06-13 15:51:48 Re: Use COPY for populating all pgbench tables