Re: Proposal: Progressive explain

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Rafael Thofehrn Castro" <rafaelthca(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: torikoshia(at)oss(dot)nttdata(dot)com
Subject: Re: Proposal: Progressive explain
Date: 2025-03-12 14:25:27
Message-ID: d61f6093-5210-42c3-9f82-33f21ac6e2db@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote:
> Did additional benchmarks and found issues with the patch that doesn't do execProcNode
> wrapping. There are sporadic crashes with **double free or corruption (top)**
> ****
> So making the patch that uses the wrapper the current one. Again, giving the credits to
> torikoshia as being the owner of that section of the code.
>

Rafael, thanks for working on it. It is a step forward in observability. I
started with some performance tests and the last improvements seem to fix the
overhead imposed in the initial patch version. I didn't notice any of these new
function in the perf report while executing fast queries.

I found a crash. It is simple to reproduce.

Session A:

select * from pg_stat_progress_explain;
\watch 2

Session B:

explain select pg_sleep(30);
\watch 2

8<--------------------------------------------------------------------8<

Backtrace:

Core was generated by `postgres: euler postgres [lo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
5401 if (ps->ExecProcNodeOriginal != NULL)
#0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
#1 0x00005624173829aa in handle_sig_alarm (postgres_signal_arg=<optimized out>) at timeout.c:414
#2 0x00005624173ba02c in wrapper_handler (postgres_signal_arg=14) at pqsignal.c:110
#3 <signal handler called>
#4 0x00007f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#5 0x00005624171fb02f in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at waiteventset.c:1190
#6 WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout(at)entry=-1, occurred_events=occurred_events(at)entry=0x7ffdd9e62080, nevents=nevents(at)entry=1, wait_event_info=wait_event_info(at)entry=100663296) at waiteventset.c:1138
#7 0x000056241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 <PqRecvBuffer>, len=8192) at be-secure.c:218
#8 0x000056241709bf2e in pq_recvbuf () at pqcomm.c:924
#9 0x000056241709ceb5 in pq_getbyte () at pqcomm.c:970
#10 0x000056241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361
#11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484
#12 PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4625
#13 0x00005624172167ed in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000056241717519b in postmaster_child_launch (child_type=<optimized out>, child_slot=2, startup_data=startup_data(at)entry=0x7ffdd9e6253c, startup_data_len=startup_data_len(at)entry=4, client_sock=client_sock(at)entry=0x7ffdd9e62540) at launch_backend.c:274
#15 0x0000562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at postmaster.c:3519
#16 ServerLoop () at postmaster.c:1688
#17 0x000056241717a6da in PostmasterMain (argc=argc(at)entry=1, argv=argv(at)entry=0x56244eeb81b0) at postmaster.c:1386
#18 0x0000562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230

8<--------------------------------------------------------------------8<

You call this feature "progressive explain". My first impression is that it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since you
are using the explain infrastructure (including the same options -- format,
settings, wal, ...) -- to print the execution plan.

+CREATE VIEW pg_stat_progress_explain AS
+ SELECT
+ *
+ FROM pg_stat_progress_explain(true);
+

There is no use for the function argument. If you decide to keep this function,
remove it.

Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).

Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.

+static const struct config_enum_entry explain_format_options[] = {
+ {"text", EXPLAIN_FORMAT_TEXT, false},
+ {"xml", EXPLAIN_FORMAT_XML, false},
+ {"json", EXPLAIN_FORMAT_JSON, false},
+ {"yaml", EXPLAIN_FORMAT_YAML, false},
+ {NULL, 0, false}
+};

Isn't it the same definition as in auto_explain.c? Use only one definition for
auto_explain and this feature. You can put this struct into explain.c, use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into
guc.h. See wal_level_options, for an example.

+static const struct config_enum_entry progressive_explain_options[] = {
+ {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+ {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+ {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+ {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+ {NULL, 0, false}
+};

The "analyze" is a separate option in auto_explain. Should we have 2 options?
One that enable/disable this feature and another one that enable/disable
analyze option.

Don't the other EXPLAIN options make sense here? Like serialize and summary.

TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ struct ExplainState *pe_es; /* progressive explain state if enabled */

Should you use the same name pattern here? pestate, for example.

PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)

Could you use a specific name? Even if you keep the proposed name, you should
use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.

+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;

s/immediatelly/immediately/

+typedef struct progressiveExplainHashKey
+{
+ int pid; /* PID */
+} progressiveExplainHashKey;
+
+typedef struct progressiveExplainHashEntry
+{
+ progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
+ dsa_handle h;
+ dsa_pointer p;
+} progressiveExplainHashEntry;

You don't need progressiveExplainHashKey. Use pid as key directly.

+ /* Update shared memory with new data */
+ strcpy(pe_data->plan, es->str->data);
+ pe_data->last_print = GetCurrentTimestamp();

I don't think last_print is accurate because it is not the time the execution plan
is printed but the time it was updated. I suggest last_updated_time.

+/* Flag set by timeouts to control when to print progressive explains */
+bool ProgressiveExplainPending = false;

s/print/update/

There are other point that you use "print" but it is better to replace it with
"update".

+ progressiveExplainArray = ShmemInitHash("progressive explain hash",
+ 50, 50,
+ &info,
+ HASH_ELEM | HASH_BLOBS);

I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
better name.

+ entry->p = dsa_allocate(es->pe_a,
+ add_size(sizeof(progressiveExplainData),
+ add_size(strlen(es->str->data),
+ PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));

I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because it
doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
PROGRESSIVE_EXPLAIN_AVAIL_SIZE?

Maybe you can use dshash.

There are some comments that still refers to the wrong function name.

+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)

and

+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *

+ /* state related to progressive explains */
+ struct PlanState *pe_curr_node;
+ struct Instrumentation *pe_local_instr;
+ dsa_area *pe_a;

Could you add some comments saying what each of these variables are for?

I didn't experiment but I was wondering if there is a way to avoid the
duplicates that you added to avoid the overhead.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2025-03-12 14:44:28 Re: Implement waiting for wal lsn replay: reloaded
Previous Message Tom Lane 2025-03-12 14:25:17 Re: Index AM API cleanup