Re: POC: Extension for adding distributed tracing - pg_tracing

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Nikita Malakhov <hukutoc(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Extension for adding distributed tracing - pg_tracing
Date: 2023-08-01 13:19:41
Message-ID: CAO6_Xqp83Rz7CZcb=OqrMxC07VUw08=NrCnVvjWN7arQYc4xNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here's a new patch with changes from the previous discussion:
- I'm now directly storing nanoseconds duration in the span instead of the
instr_time. Using the instr_time macros was a bit awkward as the
durations I generate don't necessarily have a starting and ending
instr_time.
Moving to straight nanoseconds made things clearer from my point of view.
- I've added an additional sample rate pg_tracing.sample_rate (on top of
the pg_tracing.caller_sample_rate). This one will allow queries to be
sampled even without trace information propagated from the caller.
Setting this sample rate to 1 will basically trace everything. For now,
this will only work when going through the post parse hook. I will add
support for prepared statements and cached plans for the next patch.
- I've improved how parse spans are created. It's a bit challenging to get
the start of a parse as there's no pre parse hook or instrumentation around
parse so it's only an estimation.

Regards,
Anthonin

On Fri, Jul 28, 2023 at 4:06 PM Anthonin Bonnefoy <
anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:

> > What do you think about using INSTR_TIME_SET_CURRENT,
> INSTR_TIME_SUBTRACT and INSTR_TIME_GET_MILLISEC
> > macros for timing calculations?
> If you're talking of the two instances where I'm modifying the
> instr_time's ticks, it's because I can't use the macros there.
> The first case is for the parse span. I only have the start timestamp
> using GetCurrentStatementStartTimestamp and don't
> have access to the start instr_time so I need to build the duration from 2
> timestamps.
> The second case is when building node spans from the planstate. I directly
> have the duration from Instrumentation.
>
> I guess one fix would be to use an int64 for the span duration to directly
> store nanoseconds instead of an instr_time
> but I do use the instrumentation macros outside of those two cases to get
> the duration of other spans.
>
> > Also, have you thought about a way to trace existing (running) queries
> without directly instrumenting them?
> That's a good point. I was focusing on leaving the sampling decision to
> the caller through the sampled flag and
> only recently added the pg_tracing_sample_rate parameter to give more
> control. It should be straightforward to
> add an option to create standalone traces based on sample rate alone. This
> way, setting the sample rate to 1
> would force the queries running in the session to be traced.
>
>
> On Fri, Jul 28, 2023 at 3:02 PM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>
>> Hi!
>>
>> What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
>> and INSTR_TIME_GET_MILLISEC
>> macros for timing calculations?
>>
>> Also, have you thought about a way to trace existing (running) queries
>> without directly instrumenting them? It would
>> be much more usable for maintenance and support personnel, because in
>> production environments you rarely could
>> change query text directly. For the current state the most simple
>> solution is switch tracing on and off by calling SQL
>> function, and possibly switch tracing for prepared statement the same
>> way, but not for any random query.
>>
>> I'll check the patch for the race conditions.
>>
>> --
>> Regards,
>> Nikita Malakhov
>> Postgres Professional
>> The Russian Postgres Company
>> https://postgrespro.ru/
>>
>

Attachment Content-Type Size
pg-tracing-v2.patch application/octet-stream 109.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-08-01 13:20:14 Re: Oversight in reparameterize_path_by_child leading to executor crash
Previous Message Aleksander Alekseev 2023-08-01 13:14:54 Re: Incorrect handling of OOM in WAL replay leading to data loss