Re: Proposal: recent access based routing for primary-replica setups

From: Nadav Shatz <nadav(at)tailorbrands(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal: recent access based routing for primary-replica setups
Date: 2025-12-29 09:31:12
Message-ID: CACeKOO2dOpTECY95pdHDZkeGOcW6srNYPqw+Kqs1=Qq2xYaHMQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Thanks for the help! please find attached the latest version with all
changes and test passing.

On Mon, Dec 29, 2025 at 1:58 AM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> >> thank you! works for me. should we merge both into master or do you
> want me
> >> to send a combined one?
> >
> >> do you want me to send a combined one?
> >
> > Yes, please send a combined one.
> >
> > I will do more tests and detailed code review.
>
> Also when combing the patches, please correspond followings.
>
> >>> > I also notice that something like:
> >>> >
> >>> > /* Count tokens in output for validation */
> >>> > char *line_copy = pstrdup(line);
> >>> > char *temp_token = strtok(line_copy, " \t\n");
> >>> >
> >>> > You should declare line_copy and temp_token in the begging of the
> code
> >>> > block (or in the outer block). The forward declaration is
> recommended
> >>> > coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
> >>> > some other variables.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> >> On Fri, Dec 26, 2025 at 12:03 PM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
> >>
> >>> Hi Nadav,
> >>>
> >>> I just want to make it clear. The patch should be applied on top of
> >>> your latest.patch.
> >>>
> >>> > (Please disregard previous mail. I seem to have mangled the message).
> >>> >
> >>> > I think I found a cause of the problem. On Linux, if SIGCHLD is
> >>> > ignored (set to SIG_IGN), waitpid() cannot get proper child status.
> >>> > Because the kernel relcaims the resource for the child process to not
> >>> > make the child process a zombie. And this makes waitpid() to fail
> with
> >>> > ECHLD. Since the return of waitpid() is not checked, I did not notice
> >>> > the waitpid() failure (I recommend to check the return value of
> >>> > waitpid()).
> >>> >
> >>> > /* set up signal handlers */
> >>> > signal(SIGALRM, SIG_DFL);
> >>> > signal(SIGTERM, my_signal_handler);
> >>> > signal(SIGINT, my_signal_handler);
> >>> > signal(SIGHUP, reload_config_handler);
> >>> > signal(SIGQUIT, my_signal_handler);
> >>> > signal(SIGCHLD, SIG_IGN); <--- SIGCHLD is ignored
> >>> > signal(SIGUSR1, my_signal_handler);
> >>> > signal(SIGUSR2, SIG_IGN);
> >>> > signal(SIGPIPE, SIG_IGN);
> >>> >
> >>> > To fix this, either change the line above to:
> >>> >
> >>> > signal(SIGCHLD, SIG_DFL);
> >>> > or
> >>> > signal(SIGCHLD, my_signal_handler);
> >>> > and modify my_signal_handler.
> >>> >
> >>> > I recommend the latter, because it does not depend on the default
> >>> > behavior of SIGCHLD, which might be different per platform.
> >>> > Attached is the patch to do this. (and run pgindent).
> >>> > I also notice that something like:
> >>> >
> >>> > /* Count tokens in output for validation */
> >>> > char *line_copy = pstrdup(line);
> >>> > char *temp_token = strtok(line_copy, " \t\n");
> >>> >
> >>> > You should declare line_copy and temp_token in the begging of the
> code
> >>> > block (or in the outer block). The forward declaration is
> recommended
> >>> > coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
> >>> > some other variables.
> >>> >
> >>> > Best regards,
> >>> > --
> >>> > Tatsuo Ishii
> >>> > SRA OSS K.K.
> >>> > English: http://www.sraoss.co.jp/index_en/
> >>> > Japanese:http://www.sraoss.co.jp
> >>> >
> >>> >> Hi Tatsuo,
> >>> >>
> >>> >> Thank you for the note.
> >>> >>
> >>> >> I've removed the docker stuff. started working in an ubuntu 24 VM to
> >>> match
> >>> >> the setup. hopefully the results will be better, had so many issues
> >>> >> compiling and testing before that stuff wasn't properly formulated.
> >>> >>
> >>> >> Attaching the latest patch.
> >>> >>
> >>> >> this is what i'm seeing:
> >>> >> adav(at)lima-dev:/src/pgpool2/src/test/regression$ PGHOST=/tmp
> >>> ./regress.sh -p
> >>> >> /usr/bin 041.external_replication_delay
> >>> >> creating pgpool-II temporary installation ...
> >>> >> moving pgpool_setup to temporary installation path ...
> >>> >> moving watchdog_setup to temporary installation path ...
> >>> >> using pgpool-II at /src/pgpool2/src/test/regression/temp/installed
> >>> >> *************************
> >>> >> REGRESSION MODE : install
> >>> >> Pgpool-II version : pgpool-II version 4.8devel
> (mitsukakeboshi)
> >>> >> Pgpool-II install path :
> >>> /src/pgpool2/src/test/regression/temp/installed
> >>> >> PostgreSQL bin : /usr/lib/postgresql/16/bin
> >>> >> PostgreSQL Major version : 16
> >>> >> pgbench : /usr/lib/postgresql/16/bin/pgbench
> >>> >> PostgreSQL jdbc :
> >>> >> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
> >>> >> *************************
> >>> >> testing 041.external_replication_delay...ok.
> >>> >> out of 1 ok:1 failed:0 timeout:0
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Tue, Dec 23, 2025 at 10:46 AM Tatsuo Ishii <ishii(at)postgresql(dot)org
> >
> >>> wrote:
> >>> >>
> >>> >>> > Hi Tatsuo,
> >>> >>> >
> >>> >>> > I'km running into issues testing this and have created a full
> docker
> >>> >>> > compose setup - can you please point me to up to date guides on
> the
> >>> best
> >>> >>> > way to run the tests so i know we're doing it the same way?
> >>> >>> >
> >>> >>> > Thank you for all your help!
> >>> >>>
> >>> >>> I have run the regression test on the Pgpool-II master branch on my
> >>> >>> Ubuntu 24 box.
> >>> >>>
> >>> >>> cd pgpool2/src/test/regression
> >>> >>> ./regress.sh 041
> >>> >>>
> >>> >>> This time I noticed:
> >>> >>>
> >>> >>> - The patch does not named with version number
> >>> >>> - The patch creates .dockerignore and docker/ directory.
> >>> >>>
> >>> >>> Are they intended? I am asking because they are different from the
> >>> >>> previous version.
> >>> >>>
> >>> >>> > On Tue, Dec 23, 2025 at 2:13 AM Tatsuo Ishii <
> ishii(at)postgresql(dot)org>
> >>> >>> wrote:
> >>> >>> >
> >>> >>> >> > I think everything is passing now. new version attached.
> >>> >>> >>
> >>> >>> >> Unfortunately Test1 did not pass.
> >>> >>> >>
> >>> >>> >> === Test1: Basic external command with integer millisecond
> values
> >>> ===
> >>> >>> >> waiting for server to start....1438600 2025-12-23 09:09:48.337
> JST
> >>> LOG:
> >>> >>> >> redirecting log output to logging collector process
> >>> >>> >> 1438600 2025-12-23 09:09:48.337 JST HINT: Future log output
> will
> >>> appear
> >>> >>> >> in directory "log".
> >>> >>> >> done
> >>> >>> >> server started
> >>> >>> >> waiting for server to start....1438617 2025-12-23 09:09:48.443
> JST
> >>> LOG:
> >>> >>> >> redirecting log output to logging collector process
> >>> >>> >> 1438617 2025-12-23 09:09:48.443 JST HINT: Future log output
> will
> >>> appear
> >>> >>> >> in directory "log".
> >>> >>> >> done
> >>> >>> >> server started
> >>> >>> >> waiting for server to start....1438634 2025-12-23 09:09:48.561
> JST
> >>> LOG:
> >>> >>> >> redirecting log output to logging collector process
> >>> >>> >> 1438634 2025-12-23 09:09:48.561 JST HINT: Future log output
> will
> >>> appear
> >>> >>> >> in directory "log".
> >>> >>> >> done
> >>> >>> >> server started
> >>> >>> >> CREATE TABLE
> >>> >>> >> Waiting for sr_check to run...
> >>> >>> >> Command executed after 1 seconds
> >>> >>> >> node_id | hostname | port | status | pg_status | lb_weight |
> >>> role
> >>> >>> |
> >>> >>> >> pg_role | select_cnt | load_balance_node | replication_delay |
> >>> >>> >> replication_state | replication_sync_state | last_status_change
> >>> >>> >>
> >>> >>> >>
> >>> >>>
> >>>
> ---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>> >>> >> 0 | localhost | 11002 | up | up | 0.333333 |
> >>> primary
> >>> >>> |
> >>> >>> >> primary | 0 | true | 0 |
> >>> >>> >> | | 2025-12-23 09:09:49
> >>> >>> >> 1 | localhost | 11003 | up | up | 0.333333 |
> >>> standby
> >>> >>> |
> >>> >>> >> standby | 0 | false | 0 |
> >>> >>> >> | | 2025-12-23 09:09:49
> >>> >>> >> 2 | localhost | 11004 | up | up | 0.333333 |
> >>> standby
> >>> >>> |
> >>> >>> >> standby | 0 | false | 0 |
> >>> >>> >> | | 2025-12-23 09:09:49
> >>> >>> >> (3 rows)
> >>> >>> >>
> >>> >>> >> fail: external command delay logging not found
> >>> >>> >>
> >>> >>> >> > On Mon, Nov 24, 2025 at 9:41 AM Tatsuo Ishii <
> >>> ishii(at)postgresql(dot)org>
> >>> >>> >> wrote:
> >>> >>> >> >
> >>> >>> >> >> Thank you for updating the patch! This time the patch applies
> >>> without
> >>> >>> >> >> any issue and compiles fine. Unfortunately regression test
> >>> failed.
> >>> >>> >> >>
> >>> >>> >> >> testing 041.external_replication_delay...failed.
> >>> >>> >> >>
> >>> >>> >> >> From the regression log, it seems Test7 failed.
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >>
> >>> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> >>> >> >> === Test7: Command timeout handling ===
> >>> >>> >> >> waiting for server to start....411181 2025-11-24
> 16:31:05.244 JST
> >>> >>> LOG:
> >>> >>> >> >> redirecting log output to logging collector process
> >>> >>> >> >> 411181 2025-11-24 16:31:05.244 JST HINT: Future log output
> will
> >>> >>> appear
> >>> >>> >> in
> >>> >>> >> >> directory "log".
> >>> >>> >> >> done
> >>> >>> >> >> server started
> >>> >>> >> >> waiting for server to start....411196 2025-11-24
> 16:31:05.352 JST
> >>> >>> LOG:
> >>> >>> >> >> redirecting log output to logging collector process
> >>> >>> >> >> 411196 2025-11-24 16:31:05.352 JST HINT: Future log output
> will
> >>> >>> appear
> >>> >>> >> in
> >>> >>> >> >> directory "log".
> >>> >>> >> >> done
> >>> >>> >> >> server started
> >>> >>> >> >> waiting for server to start....411213 2025-11-24
> 16:31:05.461 JST
> >>> >>> LOG:
> >>> >>> >> >> redirecting log output to logging collector process
> >>> >>> >> >> 411213 2025-11-24 16:31:05.461 JST HINT: Future log output
> will
> >>> >>> appear
> >>> >>> >> in
> >>> >>> >> >> directory "log".
> >>> >>> >> >> done
> >>> >>> >> >> server started
> >>> >>> >> >> Waiting for command timeout...
> >>> >>> >> >> fail: command timeout not detected
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >>
> >>> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> >>> >> >>
> >>> >>> >> >> Attached is the pgpool.log. If you need more info, please
> let me
> >>> >>> know.
> >>> >>> >> >>
> >>> >>> >> >> Best regards,
> >>> >>> >> >> --
> >>> >>> >> >> Tatsuo Ishii
> >>> >>> >> >> SRA OSS K.K.
> >>> >>> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >>> >> >> Japanese:http://www.sraoss.co.jp
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >> > Hi Tatsuo,
> >>> >>> >> >> >
> >>> >>> >> >> > Sorry again, this was due to the separation of 2 patches
> and i
> >>> only
> >>> >>> >> sent
> >>> >>> >> >> > the one.
> >>> >>> >> >> >
> >>> >>> >> >> > I've merged it into 1 commit and 1 patch and rebased over
> >>> master to
> >>> >>> >> avoid
> >>> >>> >> >> > these issues moving forward.
> >>> >>> >> >> >
> >>> >>> >> >> > PFA latest version
> >>> >>> >> >> >
> >>> >>> >> >> > On Thu, Nov 20, 2025 at 1:09 AM Tatsuo Ishii <
> >>> ishii(at)postgresql(dot)org
> >>> >>> >
> >>> >>> >> >> wrote:
> >>> >>> >> >> >
> >>> >>> >> >> >> Hi Nadav,
> >>> >>> >> >> >>
> >>> >>> >> >> >> Thank you for new patch.
> >>> >>> >> >> >> Unfortunately the patch did not apply to current master.
> >>> >>> >> >> >>
> >>> >>> >> >> >> $ git apply
> >>> >>> >> >> >>
> >>> ~/0001-Fix-multiple-issues-in-external-replication-delay-fe.patch
> >>> >>> >> >> >> error: patch failed:
> >>> >>> >> src/streaming_replication/pool_worker_child.c:694
> >>> >>> >> >> >> error: src/streaming_replication/pool_worker_child.c:
> patch
> >>> does
> >>> >>> not
> >>> >>> >> >> apply
> >>> >>> >> >> >>
> >>> >>> >> >> >> Maybe the patch is on top of your previous patch?
> >>> >>> >> >> >>
> >>> >>> >> >> >> Also I suggest to use "-v" option of "git format-patch" to
> >>> add the
> >>> >>> >> >> >> patch version number so that we can easily know which
> patch
> >>> is the
> >>> >>> >> >> >> latest.
> >>> >>> >> >> >>
> >>> >>> >> >> >> Best regards,
> >>> >>> >> >> >> --
> >>> >>> >> >> >> Tatsuo Ishii
> >>> >>> >> >> >> SRA OSS K.K.
> >>> >>> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >>> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >>> >> >> >>
> >>> >>> >> >> >> > Hi Tatsuo,
> >>> >>> >> >> >> >
> >>> >>> >> >> >> > Please see attached an updated version.
> >>> >>> >> >> >> >
> >>> >>> >> >> >> > thank you
> >>> >>> >> >> >> >
> >>> >>> >> >> >> > On Fri, Nov 7, 2025 at 2:07 AM Tatsuo Ishii <
> >>> >>> ishii(at)postgresql(dot)org>
> >>> >>> >> >> >> wrote:
> >>> >>> >> >> >> >
> >>> >>> >> >> >> >> > Sorry for that - thanks for the patch.
> >>> >>> >> >> >> >> >
> >>> >>> >> >> >> >> > Please find attached a new version
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >> Thanks for the new version. Unfortunately this time
> >>> regression
> >>> >>> >> test
> >>> >>> >> >> >> >> fails at:
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >> > Waiting for command timeout...
> >>> >>> >> >> >> >> > fail: command timeout not detected
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >> Attached is the pgpool.log.
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >> Best regards,
> >>> >>> >> >> >> >> --
> >>> >>> >> >> >> >> Tatsuo Ishii
> >>> >>> >> >> >> >> SRA OSS K.K.
> >>> >>> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >>> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >> > On Mon, Nov 3, 2025 at 9:05 AM Tatsuo Ishii <
> >>> >>> >> ishii(at)postgresql(dot)org>
> >>> >>> >> >> >> >> wrote:
> >>> >>> >> >> >> >> >
> >>> >>> >> >> >> >> >> > thanks and sorry for the issues, please find
> attached
> >>> >>> updated
> >>> >>> >> >> >> version.
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >> No problem.
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >> This time the patch applies fine, no compiler
> warnings.
> >>> >>> >> However,
> >>> >>> >> >> >> >> >> regression test did not passed here (on Ubuntu 24
> LTS if
> >>> >>> this
> >>> >>> >> >> >> >> >> matters). So I looked into
> >>> >>> >> >> >> >> >>
> >>> >>> >>
> src/test/regression/tests/041.external_replication_delay/test.sh a
> >>> >>> >> >> >> >> >> little bit and apply attached patch
> (test.sh.patch). It
> >>> >>> moved
> >>> >>> >> >> forward
> >>> >>> >> >> >> >> >> partially but failed at:
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >> fail: command execution failure not detected
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >> Please find attached
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >>
> >>> >>> >> >> >>
> >>> >>> >> >>
> >>> >>> >>
> >>> >>>
> >>>
> src/test/regression/tests/041.external_replication_delay/testdir/pgpool.log
> >>> >>> >> >> >> >> >> and
> >>> src/test/regression/log/041.external_replication_delay.
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >> Best regards,
> >>> >>> >> >> >> >> >> --
> >>> >>> >> >> >> >> >> Tatsuo Ishii
> >>> >>> >> >> >> >> >> SRA OSS K.K.
> >>> >>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >>> >> >> >> >> >>
> >>> >>> >> >> >> >> >
> >>> >>> >> >> >> >> >
> >>> >>> >> >> >> >> > --
> >>> >>> >> >> >> >> > Nadav Shatz
> >>> >>> >> >> >> >> > Tailor Brands | CTO
> >>> >>> >> >> >> >>
> >>> >>> >> >> >> >
> >>> >>> >> >> >> >
> >>> >>> >> >> >> > --
> >>> >>> >> >> >> > Nadav Shatz
> >>> >>> >> >> >> > Tailor Brands | CTO
> >>> >>> >> >> >>
> >>> >>> >> >> >
> >>> >>> >> >> >
> >>> >>> >> >> > --
> >>> >>> >> >> > Nadav Shatz
> >>> >>> >> >> > Tailor Brands | CTO
> >>> >>> >> >>
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> > --
> >>> >>> >> > Nadav Shatz
> >>> >>> >> > Tailor Brands | CTO
> >>> >>> >>
> >>> >>> >
> >>> >>> >
> >>> >>> > --
> >>> >>> > Nadav Shatz
> >>> >>> > Tailor Brands | CTO
> >>> >>>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Nadav Shatz
> >>> >> Tailor Brands | CTO
> >>>
> >>
> >>
> >> --
> >> Nadav Shatz
> >> Tailor Brands | CTO
> > <
>

--
Nadav Shatz
Tailor Brands | CTO

Attachment Content-Type Size
latest.patch application/x-patch 52.0 KB

In response to

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2026-01-05 23:52:57 Re: Proposal: recent access based routing for primary-replica setups
Previous Message Tatsuo Ishii 2025-12-28 23:58:03 Re: Proposal: recent access based routing for primary-replica setups