Re: Proposal: Recent mutated table tracking in memory

From: Nadav Shatz <nadav(at)tailorbrands(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal: Recent mutated table tracking in memory
Date: 2026-04-23 14:16:24
Message-ID: CACeKOO3OYtS8dezQn4beE_A1aqc2-Uqqb08pQ+oZt_brk7aByQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi Tatsuo,

Good catch on the 006.memqcache timeout. My previous fix had
wrong side effects -- setting writing_transaction for
dml_adaptive_global also changed routing behavior (it forced the
whole transaction to primary, effectively reducing the feature to
'transaction' mode). That's what caused the hang.

Fixed properly in v3: instead of touching writing_transaction,
added a memqcache-specific guard that checks whether the current
dml_adaptive* session has tracked writes in the current
transaction, and skips the cache fetch if so.

Attached: v3-0001-Feature-load-balancing-control-by-table-tracking.patch

Changes in v3 vs v2:

- pool_set_writing_transaction() reverted to original behavior
(dml_adaptive_global no longer sets writing_transaction, so
routing stays per-table as intended).

- Added new helper pool_has_dml_adaptive_write_in_transaction()
in pool_session_context.c. Returns true when the current session
is in dml_adaptive* mode, is inside an explicit transaction, and
has already tracked at least one write (via
transaction_temp_write_list).

- The two memqcache fetch guards in pool_proto_modules.c
(simple query at line 270, extended query at line 1028) now
also call !pool_has_dml_adaptive_write_in_transaction().
Autocommit writes in dml_adaptive_global are still handled by
the existing pool_invalidate_query_cache() at COMMIT time --
no change needed there.

Verified locally by mutating 006.memqcache with
disable_load_balance_on_write = 'dml_adaptive_global' in the
streaming replication mode (the only mode where dml_adaptive
applies) and the jdbctest now correctly returns "2" instead of
the stale cached "1". Both 006.memqcache and 043.track_table_mutation
pass.

Thanks!

On Thu, Apr 23, 2026 at 11:14 AM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> Hi Nadav,
>
> Unfortunately the mutated 006.memqcache failed (timeout).
>
> >> > memqcache bug fix
> >> > -----------------
> >> >
> >> > Good catch. The root cause: pool_set_writing_transaction() was
> >> > explicitly skipping dml_adaptive_global, so
> >> > pool_is_writing_transaction() always returned false in this mode.
> >> > The query cache fetch guard at pool_proto_modules.c:270
> >> > (!pool_is_writing_transaction()) then served stale cached results
> >> > after DML in the same transaction.
> >> >
> >> > Fix: pool_set_writing_transaction() now sets the flag for
> >> > dml_adaptive_global (only 'off' and 'dml_adaptive' skip it). This
> >> > ensures the query cache is properly bypassed after writes within
> >> > the same transaction.
>
> Regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> > Hi Tatsuo,
> >
> > Rebased onto current master, renumbered the regression tests
> > (043/044 to avoid collision with the new 042.ssl_reload), and
> > combined everything into a single commit.
> >
> > Attached: v2-0001-Feature-load-balancing-control-by-table-tracking.patch
> >
> > Looking forward to your review.
> >
> >
> > On Sun, Apr 19, 2026 at 10:25 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
> >
> >> > Hi Tatsuo,
> >> >
> >> > hank you for the detailed review. Attached patch addresses all items.
> >>
> >> I guess the attached patch is on top of
> >> v1-0001-Feature-load-balancing-control-by-table-tracking.patch. To
> >> apply v2-0001-address-review.patch, we need to apply
> >> v1-0001-Feature-load-balancing-control-by-table-tracking.patch first.
> >> Unfortunately due to recent commit, it does not apply anymore. Can you
> >> please provide v1 + v2 that are rebased against latest master branch?
> >> Also 042 regression test is already used by recent commit. Can you
> >> renumber 042.track_table_mutation and
> >> 043.track_table_mutation_watchdog to 043.track_table_mutation and
> >> 044.track_table_mutation_watchdog accordingly?
> >>
> >> Looking forward to seeing new patch.
> >>
> >> Regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS K.K.
> >> English: http://www.sraoss.co.jp/index_en/
> >> Japanese:http://www.sraoss.co.jp
> >>
> >>
> >> > memqcache bug fix
> >> > -----------------
> >> >
> >> > Good catch. The root cause: pool_set_writing_transaction() was
> >> > explicitly skipping dml_adaptive_global, so
> >> > pool_is_writing_transaction() always returned false in this mode.
> >> > The query cache fetch guard at pool_proto_modules.c:270
> >> > (!pool_is_writing_transaction()) then served stale cached results
> >> > after DML in the same transaction.
> >> >
> >> > Fix: pool_set_writing_transaction() now sets the flag for
> >> > dml_adaptive_global (only 'off' and 'dml_adaptive' skip it). This
> >> > ensures the query cache is properly bypassed after writes within
> >> > the same transaction.
> >> >
> >> > Removed dead query parse cache code (~700 lines)
> >> > -------------------------------------------------
> >> >
> >> > You're right -- pool_track_table_mutation_get_cached_parse,
> >> > pool_track_table_mutation_cache_parse, and
> >> > pool_track_table_mutation_normalize_and_hash were never called.
> >> > These were leftover from an earlier design where we planned to
> >> > cache SQL parse results in shared memory. The feature ended up
> >> > using pgpool's existing parser directly, and this code was never
> >> > wired up.
> >> >
> >> > Removed: QueryParseCache and QueryParseEntry structs, all related
> >> > static functions, the TRACK_TABLE_MUTATION_QUERY_SEM semaphore,
> >> > and the track_table_mutation_query_buckets /
> >> > track_table_mutation_query_parse_cache_size configuration
> >> > parameters. This also reduces shared memory usage from ~6.4 MB
> >> > to ~80 KB with default settings.
> >> >
> >> > check_object_relationship_list scope
> >> > -------------------------------------
> >> >
> >> > You're correct -- dml_adaptive_global does not use
> >> > dml_adaptive_object_relationship_list. Changed
> >> > check_object_relationship_list() to check for DLBOW_DML_ADAPTIVE
> >> > only, not DLBOW_IS_DML_ADAPTIVE (which includes global).
> >> >
> >> > Documentation fixes
> >> > -------------------
> >> >
> >> > - Removed "(Lagless Replica Reads)" from section title and
> >> > "lagless" language from description.
> >> >
> >> > - Described fallback behavior when neither
> >> > replication_delay_source_cmd nor delay_threshold_by_time is
> >> > configured (TTL stays at 100ms default minimum).
> >> >
> >> > - "query cache" references removed (the query parse cache is gone).
> >> >
> >> > - Added 128-table-per-SELECT limit to Limitations section
> >> > (uses POOL_MAX_SELECT_OIDS).
> >> >
> >> > Code style fixes
> >> > ----------------
> >> >
> >> > - DLBOW_IS_DML_ADAPTIVE() calls no longer split across lines.
> >> >
> >> > - Split the long errmsg line in
> >> > is_select_object_in_temp_write_list.
> >> >
> >> > - Removed redundant is_adaptive variable in
> >> > is_select_object_in_temp_write_list (the check at function
> >> > entry already guarantees it).
> >> >
> >> > Thanks!
> >> >
> >> > On Wed, Apr 15, 2026 at 1:43 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> >> wrote:
> >> >
> >> >> Hi Nadav,
> >> >>
> >> >> > Hi Tatsuo,
> >> >> >
> >> >> > Looks good to me thanks!
> >> >> >
> >> >> > Please go ahead with your review. waiting to hear back from you.
> >> >>
> >> >> Here are the code review results.
> >> >>
> >> >> diff --git a/doc/src/sgml/loadbalance.sgml
> >> b/doc/src/sgml/loadbalance.sgml
> >> >> index 9e1e7b39b..7384ce81a 100644
> >> >> --- a/doc/src/sgml/loadbalance.sgml
> >> >> +++ b/doc/src/sgml/loadbalance.sgml
> >> >> :
> >> >> + <sect2 id="runtime-config-table-mutation-map">
> >> >> + <title>Table Mutation Map Configuration (Lagless Replica
> >> Reads)</title>
> >> >>
> >> >> "(Lagless Replica Reads)" sounds like an advertisement to me. It
> >> >> should be removed.
> >> >>
> >> >> + <para>
> >> >> + These parameters configure the track table mutation feature,
> which
> >> is
> >> >> activated by setting
> >> >> + <xref linkend="guc-disable-load-balance-on-write"> to
> >> >> <literal>dml_adaptive_global</literal>.
> >> >> + The feature tracks recently written tables to prevent stale reads
> >> from
> >> >> replica nodes during
> >> >> + replication lag, implementing the "lagless" architecture pattern
> for
> >> >> distributed systems
> >> >> + with read replicas.
> >> >>
> >> >> I think the feature does not guarantee "lagless" anytime, in all
> cases.
> >> >>
> >> >> + <para>
> >> >> + This feature requires time-based replication delay monitoring.
> This
> >> >> can be provided by either
> >> >> + <xref linkend="guc-replication-delay-source-cmd"> (external
> command
> >> >> mode) or by setting
> >> >> + <xref linkend="guc-delay-threshold-by-time"> (which uses
> >> >> <literal>pg_stat_replication.replay_lag</literal>
> >> >> + from PostgreSQL 10+). At least one of these must be configured
> for
> >> the
> >> >> TTL calculation to work.
> >> >>
> >> >> If one of these is not set, what happens? Error? Need to describe it.
> >> >>
> >> >> + </para>
> >> >> +
> >> >> + <warning>
> >> >> + <para>
> >> >> + Enabling <literal>dml_adaptive_global</literal> increases shared
> >> >> memory consumption. With default settings,
> >> >> + the feature requires approximately 6.4 MB of shared memory (0.1
> MB
> >> >> for table tracking + 6.3 MB for query cache).
> >> >>
> >> >> "query cache" should be "query parse cache".
> >> >>
> >> >> + Memory usage scales with configuration parameters:
> >> >> + </para>
> >> >> + <itemizedlist>
> >> >> + <listitem>
> >> >> + <para>
> >> >> + Table tracking: <literal>track_table_mutation_table_size * 40
> >> >> bytes</literal> (default: 2048 * 40 = ~80 KB)
> >> >> + </para>
> >> >> + </listitem>
> >> >> + <listitem>
> >> >> + <para>
> >> >> + Query cache:
> >> <literal>track_table_mutation_query_parse_cache_size *
> >> >> 640 bytes</literal> (default: 10000 * 640 = ~6.3 MB)
> >> >>
> >> >> "query cache" should be "query parse cache".
> >> >>
> >> >> + <title>Limitations</title>
> >> >>
> >> >> I think number of tables tacked in a SELECT is limited to 8. It
> should
> >> >> be mentioned.
> >> >>
> >> >> diff --git a/src/context/pool_query_context.c
> >> >> b/src/context/pool_query_context.c
> >> >> index a056ac596..0190d3673 100644
> >> >> --- a/src/context/pool_query_context.c
> >> >> +++ b/src/context/pool_query_context.c
> >> >> @@ -1828,15 +1829,23 @@ is_in_list(char *name, List *list)
> >> >> static bool
> >> >> is_select_object_in_temp_write_list(Node *node, void *context)
> >> >> {
> >> >> - if (node == NULL ||
> pool_config->disable_load_balance_on_write
> >> !=
> >> >> DLBOW_DML_ADAPTIVE)
> >> >> + if (node == NULL ||
> >> >> + !DLBOW_IS_DML_ADAPTIVE(
> >> >> +
> >> >> pool_config->disable_load_balance_on_write))
> >> >>
> >> >> You don't need to split the line.
> >> >>
> >> >> + is_adaptive = DLBOW_IS_DML_ADAPTIVE(
> >> >> +
> >> >> pool_config->disable_load_balance_on_write);
> >> >>
> >> >> You don't need to split the line.
> >> >>
> >> >> - if (pool_config->disable_load_balance_on_write ==
> >> >> DLBOW_DML_ADAPTIVE && session_context->is_in_transaction)
> >> >> + if (is_adaptive &&
> >> >> + session_context->is_in_transaction)
> >> >> {
> >> >> ereport(DEBUG1,
> >> >>
> >> >> (errmsg("is_select_object_in_temp_write_list: \"%s\", found relation
> >> >> \"%s\"", (char *) context, rgv->relname)));
> >> >> This line is too long. Please split.
> >> >>
> >> >> @@ -1880,7 +1889,13 @@ static char
> >> >> *get_associated_object_from_dml_adaptive_relations
> >> >> void
> >> >> check_object_relationship_list(char *name, bool is_func_name)
> >> >> {
> >> >> - if (pool_config->disable_load_balance_on_write ==
> >> >> DLBOW_DML_ADAPTIVE &&
> >> >> pool_config->parsed_dml_adaptive_object_relationship_list)
> >> >> + bool is_adaptive;
> >> >> +
> >> >> + is_adaptive = DLBOW_IS_DML_ADAPTIVE(
> >> >> +
> >> >> pool_config->disable_load_balance_on_write);
> >> >>
> >> >> I wrote in the commit message:
> >> >>
> >> >> modifications are only detected in the same transaction). Note,
> >> >> however, you cannot use dml_adaptive_object_relationship_list to
> track
> >> >> dependency among table and other objects.
> >> >>
> >> >> In my understanding the feature does not use
> >> >> dml_adaptive_object_relationship_list. If this is correct, why
> >> >> check_object_relationship_list() is called here in case
> >> >> dml_adaptive_global? If the feature uses
> >> >> dml_adaptive_object_relationship_list, test cases should be included.
> >> >>
> >> >> diff --git a/src/utils/pool_track_table_mutation.c
> >> >> b/src/utils/pool_track_table_mutation.c
> >> >> new file mode 100644
> >> >> index 000000000..9be46b28f
> >> >> --- /dev/null
> >> >> +++ b/src/utils/pool_track_table_mutation.c
> >> >>
> >> >> It seems following functions are not used anywhere. I wonder if this
> >> >> feature actually use "query parse cache".
> >> >>
> >> >> pool_track_table_mutation_get_cached_parse
> >> >> pool_track_table_mutation_cache_parse
> >> >> pool_track_table_mutation_normalize_and_hash
> >> >>
> >> >> Besides the code review, I mutated one of regression tests to check
> >> >> whether the feature co exists with in the existing memory query cache
> >> >> feature. After attached patch applied, I ran 006.memqcache and got
> the
> >> >> following result.
> >> >>
> >> >> cd src/test/regression
> >> >> ./regress.sh 006
> >> >> creating pgpool-II temporary installation ...
> >> >> moving pgpool_setup to temporary installation path ...
> >> >> moving watchdog_setup to temporary installation path ...
> >> >> using pgpool-II at
> >> >>
> >>
> /home/t-ishii/work/Pgpool-II/current/pgpool2/src/test/regression/temp/installed
> >> >> *************************
> >> >> REGRESSION MODE : install
> >> >> Pgpool-II version : pgpool-II version 4.8devel
> (mitsukakeboshi)
> >> >> Pgpool-II install path :
> >> >>
> >>
> /home/t-ishii/work/Pgpool-II/current/pgpool2/src/test/regression/temp/installed
> >> >> PostgreSQL bin : /usr/local/pgsql/bin
> >> >> PostgreSQL Major version : 18
> >> >> pgbench : /usr/local/pgsql/bin/pgbench
> >> >> PostgreSQL jdbc :
> >> >> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
> >> >> *************************
> >> >> testing 006.memqcache...failed.
> >> >> out of 1 ok:0 failed:1 timeout:0
> >> >>
> >> >> log/006.memqcache shows:
> >> >>
> >> >> ../expected.txt result.txt differ: char 1, line 1
> >> >>
> >> >> So I checked the test script and found the error was generated by a
> >> >> Java program test.
> >> >>
> >> >> java jdbctest > result.txt 2>&1
> >> >> cmp ../expected.txt result.txt
> >> >> if [ $? != 0 ];then
> >> >> ./shutdownall
> >> >> exit 1
> >> >> fi
> >> >>
> >> >> In jdbctest.java:
> >> >>
> >> >> /*
> >> >> * Cache test in an explicit transaction
> >> >> */
> >> >> conn.setAutoCommit(false);
> >> >> // execute DML. This should prevent SELECTs from
> using
> >> >> query cache in the transaction.
> >> >> sql = "UPDATE t1 SET i = 2;";
> >> >> pst = conn.createStatement();
> >> >> pst.executeUpdate(sql);
> >> >> pst.close();
> >> >> // should not use the cache and should return "2",
> >> rather
> >> >> than "1"
> >> >> prest = conn.prepareStatement("SELECT * FROM t1");
> >> >> rs = prest.executeQuery();
> >> >>
> >> >> The expected file (expected.txt) has "2" but the result file
> >> >> (testdir/result.txt) was "1". This is the reason why the test
> >> >> failed. I wonder if there's something wrong with the feature when the
> >> >> query cache is enabled. Can you look into this?
> >> >>
> >> >> Regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS K.K.
> >> >> English: http://www.sraoss.co.jp/index_en/
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >
> >> >
> >> > --
> >> > Nadav Shatz
> >> > Tailor Brands | CTO
> >>
> >
> >
> > --
> > Nadav Shatz
> > Tailor Brands | CTO
>

--
Nadav Shatz
Tailor Brands | CTO

Attachment Content-Type Size
v3-0001-Feature-load-balancing-control-by-table-tracking.patch application/octet-stream 91.9 KB

In response to

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2026-04-25 13:30:51 Problem with pcp process
Previous Message Tatsuo Ishii 2026-04-23 08:14:08 Re: Proposal: Recent mutated table tracking in memory