| From: | Nadav Shatz <nadav(at)tailorbrands(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Proposal: recent access based routing for primary-replica setups |
| Date: | 2025-12-28 12:21:15 |
| Message-ID: | CACeKOO1TB-+w9VzFQ1O7tjUVaU2Fc+KJdvihJ=_jZBPAs2wg2w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
thank you! works for me. should we merge both into master or do you want me
to send a combined one?
On Fri, Dec 26, 2025 at 12:03 PM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> Hi Nadav,
>
> I just want to make it clear. The patch should be applied on top of
> your latest.patch.
>
> > (Please disregard previous mail. I seem to have mangled the message).
> >
> > I think I found a cause of the problem. On Linux, if SIGCHLD is
> > ignored (set to SIG_IGN), waitpid() cannot get proper child status.
> > Because the kernel relcaims the resource for the child process to not
> > make the child process a zombie. And this makes waitpid() to fail with
> > ECHLD. Since the return of waitpid() is not checked, I did not notice
> > the waitpid() failure (I recommend to check the return value of
> > waitpid()).
> >
> > /* set up signal handlers */
> > signal(SIGALRM, SIG_DFL);
> > signal(SIGTERM, my_signal_handler);
> > signal(SIGINT, my_signal_handler);
> > signal(SIGHUP, reload_config_handler);
> > signal(SIGQUIT, my_signal_handler);
> > signal(SIGCHLD, SIG_IGN); <--- SIGCHLD is ignored
> > signal(SIGUSR1, my_signal_handler);
> > signal(SIGUSR2, SIG_IGN);
> > signal(SIGPIPE, SIG_IGN);
> >
> > To fix this, either change the line above to:
> >
> > signal(SIGCHLD, SIG_DFL);
> > or
> > signal(SIGCHLD, my_signal_handler);
> > and modify my_signal_handler.
> >
> > I recommend the latter, because it does not depend on the default
> > behavior of SIGCHLD, which might be different per platform.
> > Attached is the patch to do this. (and run pgindent).
> > I also notice that something like:
> >
> > /* Count tokens in output for validation */
> > char *line_copy = pstrdup(line);
> > char *temp_token = strtok(line_copy, " \t\n");
> >
> > You should declare line_copy and temp_token in the begging of the code
> > block (or in the outer block). The forward declaration is recommended
> > coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
> > some other variables.
> >
> > Best regards,
> > --
> > Tatsuo Ishii
> > SRA OSS K.K.
> > English: http://www.sraoss.co.jp/index_en/
> > Japanese:http://www.sraoss.co.jp
> >
> >> Hi Tatsuo,
> >>
> >> Thank you for the note.
> >>
> >> I've removed the docker stuff. started working in an ubuntu 24 VM to
> match
> >> the setup. hopefully the results will be better, had so many issues
> >> compiling and testing before that stuff wasn't properly formulated.
> >>
> >> Attaching the latest patch.
> >>
> >> this is what i'm seeing:
> >> adav(at)lima-dev:/src/pgpool2/src/test/regression$ PGHOST=/tmp
> ./regress.sh -p
> >> /usr/bin 041.external_replication_delay
> >> creating pgpool-II temporary installation ...
> >> moving pgpool_setup to temporary installation path ...
> >> moving watchdog_setup to temporary installation path ...
> >> using pgpool-II at /src/pgpool2/src/test/regression/temp/installed
> >> *************************
> >> REGRESSION MODE : install
> >> Pgpool-II version : pgpool-II version 4.8devel (mitsukakeboshi)
> >> Pgpool-II install path :
> /src/pgpool2/src/test/regression/temp/installed
> >> PostgreSQL bin : /usr/lib/postgresql/16/bin
> >> PostgreSQL Major version : 16
> >> pgbench : /usr/lib/postgresql/16/bin/pgbench
> >> PostgreSQL jdbc :
> >> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
> >> *************************
> >> testing 041.external_replication_delay...ok.
> >> out of 1 ok:1 failed:0 timeout:0
> >>
> >>
> >>
> >> On Tue, Dec 23, 2025 at 10:46 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
> >>
> >>> > Hi Tatsuo,
> >>> >
> >>> > I'km running into issues testing this and have created a full docker
> >>> > compose setup - can you please point me to up to date guides on the
> best
> >>> > way to run the tests so i know we're doing it the same way?
> >>> >
> >>> > Thank you for all your help!
> >>>
> >>> I have run the regression test on the Pgpool-II master branch on my
> >>> Ubuntu 24 box.
> >>>
> >>> cd pgpool2/src/test/regression
> >>> ./regress.sh 041
> >>>
> >>> This time I noticed:
> >>>
> >>> - The patch does not named with version number
> >>> - The patch creates .dockerignore and docker/ directory.
> >>>
> >>> Are they intended? I am asking because they are different from the
> >>> previous version.
> >>>
> >>> > On Tue, Dec 23, 2025 at 2:13 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> >>> wrote:
> >>> >
> >>> >> > I think everything is passing now. new version attached.
> >>> >>
> >>> >> Unfortunately Test1 did not pass.
> >>> >>
> >>> >> === Test1: Basic external command with integer millisecond values
> ===
> >>> >> waiting for server to start....1438600 2025-12-23 09:09:48.337 JST
> LOG:
> >>> >> redirecting log output to logging collector process
> >>> >> 1438600 2025-12-23 09:09:48.337 JST HINT: Future log output will
> appear
> >>> >> in directory "log".
> >>> >> done
> >>> >> server started
> >>> >> waiting for server to start....1438617 2025-12-23 09:09:48.443 JST
> LOG:
> >>> >> redirecting log output to logging collector process
> >>> >> 1438617 2025-12-23 09:09:48.443 JST HINT: Future log output will
> appear
> >>> >> in directory "log".
> >>> >> done
> >>> >> server started
> >>> >> waiting for server to start....1438634 2025-12-23 09:09:48.561 JST
> LOG:
> >>> >> redirecting log output to logging collector process
> >>> >> 1438634 2025-12-23 09:09:48.561 JST HINT: Future log output will
> appear
> >>> >> in directory "log".
> >>> >> done
> >>> >> server started
> >>> >> CREATE TABLE
> >>> >> Waiting for sr_check to run...
> >>> >> Command executed after 1 seconds
> >>> >> node_id | hostname | port | status | pg_status | lb_weight |
> role
> >>> |
> >>> >> pg_role | select_cnt | load_balance_node | replication_delay |
> >>> >> replication_state | replication_sync_state | last_status_change
> >>> >>
> >>> >>
> >>>
> ---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >>> >> 0 | localhost | 11002 | up | up | 0.333333 |
> primary
> >>> |
> >>> >> primary | 0 | true | 0 |
> >>> >> | | 2025-12-23 09:09:49
> >>> >> 1 | localhost | 11003 | up | up | 0.333333 |
> standby
> >>> |
> >>> >> standby | 0 | false | 0 |
> >>> >> | | 2025-12-23 09:09:49
> >>> >> 2 | localhost | 11004 | up | up | 0.333333 |
> standby
> >>> |
> >>> >> standby | 0 | false | 0 |
> >>> >> | | 2025-12-23 09:09:49
> >>> >> (3 rows)
> >>> >>
> >>> >> fail: external command delay logging not found
> >>> >>
> >>> >> > On Mon, Nov 24, 2025 at 9:41 AM Tatsuo Ishii <
> ishii(at)postgresql(dot)org>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Thank you for updating the patch! This time the patch applies
> without
> >>> >> >> any issue and compiles fine. Unfortunately regression test
> failed.
> >>> >> >>
> >>> >> >> testing 041.external_replication_delay...failed.
> >>> >> >>
> >>> >> >> From the regression log, it seems Test7 failed.
> >>> >> >>
> >>> >> >>
> >>> >>
> >>>
> ------------------------------------------------------------------------------
> >>> >> >> === Test7: Command timeout handling ===
> >>> >> >> waiting for server to start....411181 2025-11-24 16:31:05.244 JST
> >>> LOG:
> >>> >> >> redirecting log output to logging collector process
> >>> >> >> 411181 2025-11-24 16:31:05.244 JST HINT: Future log output will
> >>> appear
> >>> >> in
> >>> >> >> directory "log".
> >>> >> >> done
> >>> >> >> server started
> >>> >> >> waiting for server to start....411196 2025-11-24 16:31:05.352 JST
> >>> LOG:
> >>> >> >> redirecting log output to logging collector process
> >>> >> >> 411196 2025-11-24 16:31:05.352 JST HINT: Future log output will
> >>> appear
> >>> >> in
> >>> >> >> directory "log".
> >>> >> >> done
> >>> >> >> server started
> >>> >> >> waiting for server to start....411213 2025-11-24 16:31:05.461 JST
> >>> LOG:
> >>> >> >> redirecting log output to logging collector process
> >>> >> >> 411213 2025-11-24 16:31:05.461 JST HINT: Future log output will
> >>> appear
> >>> >> in
> >>> >> >> directory "log".
> >>> >> >> done
> >>> >> >> server started
> >>> >> >> Waiting for command timeout...
> >>> >> >> fail: command timeout not detected
> >>> >> >>
> >>> >> >>
> >>> >>
> >>>
> ------------------------------------------------------------------------------
> >>> >> >>
> >>> >> >> Attached is the pgpool.log. If you need more info, please let me
> >>> know.
> >>> >> >>
> >>> >> >> Best regards,
> >>> >> >> --
> >>> >> >> Tatsuo Ishii
> >>> >> >> SRA OSS K.K.
> >>> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >> >> Japanese:http://www.sraoss.co.jp
> >>> >> >>
> >>> >> >>
> >>> >> >> > Hi Tatsuo,
> >>> >> >> >
> >>> >> >> > Sorry again, this was due to the separation of 2 patches and i
> only
> >>> >> sent
> >>> >> >> > the one.
> >>> >> >> >
> >>> >> >> > I've merged it into 1 commit and 1 patch and rebased over
> master to
> >>> >> avoid
> >>> >> >> > these issues moving forward.
> >>> >> >> >
> >>> >> >> > PFA latest version
> >>> >> >> >
> >>> >> >> > On Thu, Nov 20, 2025 at 1:09 AM Tatsuo Ishii <
> ishii(at)postgresql(dot)org
> >>> >
> >>> >> >> wrote:
> >>> >> >> >
> >>> >> >> >> Hi Nadav,
> >>> >> >> >>
> >>> >> >> >> Thank you for new patch.
> >>> >> >> >> Unfortunately the patch did not apply to current master.
> >>> >> >> >>
> >>> >> >> >> $ git apply
> >>> >> >> >>
> ~/0001-Fix-multiple-issues-in-external-replication-delay-fe.patch
> >>> >> >> >> error: patch failed:
> >>> >> src/streaming_replication/pool_worker_child.c:694
> >>> >> >> >> error: src/streaming_replication/pool_worker_child.c: patch
> does
> >>> not
> >>> >> >> apply
> >>> >> >> >>
> >>> >> >> >> Maybe the patch is on top of your previous patch?
> >>> >> >> >>
> >>> >> >> >> Also I suggest to use "-v" option of "git format-patch" to
> add the
> >>> >> >> >> patch version number so that we can easily know which patch
> is the
> >>> >> >> >> latest.
> >>> >> >> >>
> >>> >> >> >> Best regards,
> >>> >> >> >> --
> >>> >> >> >> Tatsuo Ishii
> >>> >> >> >> SRA OSS K.K.
> >>> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >> >> >>
> >>> >> >> >> > Hi Tatsuo,
> >>> >> >> >> >
> >>> >> >> >> > Please see attached an updated version.
> >>> >> >> >> >
> >>> >> >> >> > thank you
> >>> >> >> >> >
> >>> >> >> >> > On Fri, Nov 7, 2025 at 2:07 AM Tatsuo Ishii <
> >>> ishii(at)postgresql(dot)org>
> >>> >> >> >> wrote:
> >>> >> >> >> >
> >>> >> >> >> >> > Sorry for that - thanks for the patch.
> >>> >> >> >> >> >
> >>> >> >> >> >> > Please find attached a new version
> >>> >> >> >> >>
> >>> >> >> >> >> Thanks for the new version. Unfortunately this time
> regression
> >>> >> test
> >>> >> >> >> >> fails at:
> >>> >> >> >> >>
> >>> >> >> >> >> > Waiting for command timeout...
> >>> >> >> >> >> > fail: command timeout not detected
> >>> >> >> >> >>
> >>> >> >> >> >> Attached is the pgpool.log.
> >>> >> >> >> >>
> >>> >> >> >> >> Best regards,
> >>> >> >> >> >> --
> >>> >> >> >> >> Tatsuo Ishii
> >>> >> >> >> >> SRA OSS K.K.
> >>> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >> >> >> >>
> >>> >> >> >> >> > On Mon, Nov 3, 2025 at 9:05 AM Tatsuo Ishii <
> >>> >> ishii(at)postgresql(dot)org>
> >>> >> >> >> >> wrote:
> >>> >> >> >> >> >
> >>> >> >> >> >> >> > thanks and sorry for the issues, please find attached
> >>> updated
> >>> >> >> >> version.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> No problem.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> This time the patch applies fine, no compiler warnings.
> >>> >> However,
> >>> >> >> >> >> >> regression test did not passed here (on Ubuntu 24 LTS if
> >>> this
> >>> >> >> >> >> >> matters). So I looked into
> >>> >> >> >> >> >>
> >>> >> src/test/regression/tests/041.external_replication_delay/test.sh a
> >>> >> >> >> >> >> little bit and apply attached patch (test.sh.patch). It
> >>> moved
> >>> >> >> forward
> >>> >> >> >> >> >> partially but failed at:
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> fail: command execution failure not detected
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> Please find attached
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> src/test/regression/tests/041.external_replication_delay/testdir/pgpool.log
> >>> >> >> >> >> >> and
> src/test/regression/log/041.external_replication_delay.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> Best regards,
> >>> >> >> >> >> >> --
> >>> >> >> >> >> >> Tatsuo Ishii
> >>> >> >> >> >> >> SRA OSS K.K.
> >>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
> >>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
> >>> >> >> >> >> >>
> >>> >> >> >> >> >
> >>> >> >> >> >> >
> >>> >> >> >> >> > --
> >>> >> >> >> >> > Nadav Shatz
> >>> >> >> >> >> > Tailor Brands | CTO
> >>> >> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > --
> >>> >> >> >> > Nadav Shatz
> >>> >> >> >> > Tailor Brands | CTO
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Nadav Shatz
> >>> >> >> > Tailor Brands | CTO
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Nadav Shatz
> >>> >> > Tailor Brands | CTO
> >>> >>
> >>> >
> >>> >
> >>> > --
> >>> > Nadav Shatz
> >>> > Tailor Brands | CTO
> >>>
> >>
> >>
> >> --
> >> Nadav Shatz
> >> Tailor Brands | CTO
>
--
Nadav Shatz
Tailor Brands | CTO
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2025-12-28 23:48:44 | Re: Proposal: recent access based routing for primary-replica setups |
| Previous Message | Tatsuo Ishii | 2025-12-26 10:03:49 | Re: Proposal: recent access based routing for primary-replica setups |