Re: 120.memory_leak_extended_memqcache fails on master

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: 120.memory_leak_extended_memqcache fails on master
Date: 2026-04-30 23:35:29
Message-ID: 20260501.083529.324263108638250692.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

> Summary:
>
> - Recent 120.memory_leak_extended_memqcache regression test failures
> on master branch do not mean we have memory leaks with query cache
> and/or extended query processing. It's false positive because the
> test script is not prepared for child process going away by FATAL
> error.
>
> - The FATAL error is caused by an oversight in commit a350afd70
> "Feature: send sync message only to necessary backends.".
>
> - Proposed fix attached.
>
> Recently 120.memory_leak_extended_memqcache regression test
> occasionally fails on master. So I dug into the buildfarm log and
> found following error. Note that in this case the load balance node is
> 1. If it is 0, the test succeeds.
>
> The test runs pgbench and the log below emitted at the end of
> bench marking. When pgbench finishes, pgpool executes queries
> registered in reset_query_list. "DISCARD ALL" comes from it.
>
> 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 1 backend pid: 62625 statement: Sync
> 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 0 backend pid: 62624 statement: DISCARD ALL
> 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 1 backend pid: 62625 statement: DISCARD ALL
> 2026-04-29 01:00:58.288: pgbench pid 62605: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 0 backend pid: 62624 statement: "DISCARD ALL" message: "DISCARD ALL cannot run inside a transaction block"
> 2026-04-29 01:00:58.288: pgbench pid 62605: WARNING: packet kind of backend 1 ['C'] does not match with main/majority nodes packet kind ['E']
> 2026-04-29 01:00:58.288: pgbench pid 62605: FATAL: failed to read kind from backend
> 2026-04-29 01:00:58.288: pgbench pid 62605: DETAIL: kind mismatch among backends. Possible last query was: "DISCARD ALL" kind details are:0[E: DISCARD ALL cannot run inside a transaction block] 1[C]
> 2026-04-29 01:00:58.288: pgbench pid 62605: HINT: check data consistency among db nodes
>
> The FATAL error terminated child process 62605, caues the test
> failure. From 129 test's test.sh:
>
> # run pgbench for a while
> $PGBENCH -M extended -S -T 30 test
>
> after_size=`ps l $pid|tail -1|awk '{print $7}'`
> delta=`expr $after_size - $init_size`
>
> echo "initial process size: $init_size after size: $after_size delta: $delta"
>
> test $delta -eq 0
>
> if [ $? != 0 ];then
> echo "memory leak in $delta KB in mode:$mode"
> ./shutdownall
> exit 1
> fi
>
> and from src/test/regression/log/120.memory_leak_extended_memqcache:
>
> expr: non-integer argument
> initial process size: 156900 after size: VSZ delta:
> ./test.sh: line 59: test: -eq: unary operator expected
> memory leak in KB in mode:s
>
> Since ps command does not see the data of process 62605 any more, the
> "test $delta -eq 0" command failed and it makes the script believe that
> there's a memory leak. Probably we should fix test.sh here but I think
> FATAL error should be fixed in the first place. Once we fix it,
> there's no need to fix test.sh. So I think we can leave test.sh as it
> is now.
>
> The reason why DISCARD ALL failed on node 0 PostgreSQL is, DISCARD ALL
> executes inside a transaction block according to the log. However the
> queries executed by pgbench do not use an explicit transactions. I was
> puzzled. So I look into PostgreSQL log on node 0. Grepping related
> process (62624) from the PostgreSQL log shows this:
>
> 62624 2026-04-29 01:00:28.288 UTC LOG: execute pgpool62605/pgpool62605: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_catalog.pg_namespace AS n WHERE c.relname = 'pgbench_accounts' AND c.relnamespace = n.oid AND n.nspname ~ '^pg_temp_'
> 62624 2026-04-29 01:00:29.068 UTC LOG: execute pgpool62605/pgpool62605: SELECT COALESCE(pg_catalog.to_regclass('"pgbench_accounts"')::oid, 0)
> 62624 2026-04-29 01:00:58.287 UTC LOG: statement: DISCARD ALL
> 62624 2026-04-29 01:00:58.287 UTC ERROR: DISCARD ALL cannot run inside a transaction block
> 62624 2026-04-29 01:00:58.287 UTC STATEMENT: DISCARD ALL
>
> Those two execute commands were issued by do_query, which is
> responsible for getting information for the relation cache
> module. When pgpool runs under extended query mode and outside an
> explicit transaction, do_query does not issue a sync message because
> it could break unnamed portal. So in process 62624 on node 0 continues
> to be in an extended query mode without sending sync. In this case
> PostgreSQL keeps on opening an implicit transaction until sync message
> is sent. Why pgpool did not send sync to node 0?
>
> Commit a350afd70 allows to send sync messages only to necessary
> backend nodes. Any queries comes from clients are tracked for this
> purpose. However it forgot to the case when do_query generates
> internal queries. I think we should track such that queries, so that a
> sync message is issued to the server which do_query sent to, when the
> client sends a sync message.
>
> For this purpose I add a new member "bool
> pending_sync_map[MAX_NUM_BACKENDS]" to session_context, which a node
> id is set to when do_query sends a extended query to backend. Note
> that it is set only when do_query is called outside an explicit
> transaction. In add_sync_pending_message, which is called when client
> sends a sync message, we check the map and set sync_map so that
> subsequent call to SimpleForwardToBackend will send sync message to
> the backend node.
>
> Attached patch implements this.

Pushed to master branch only, since the commit a350afd70 had not been
back-patched to stable branches (it's a new feature).

Note that we haven't seen failure with 119.memory_leak_extended, which
is pretty similar test to 120. The difference is, 119 uses only one DB
node, which masks the bug of the commit.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgpool-hackers by date

  From Date Subject
Previous Message Tatsuo Ishii 2026-04-30 10:15:09 120.memory_leak_extended_memqcache fails on master