| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | nadav(at)tailorbrands(dot)com |
| Cc: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Proposal: recent access based routing for primary-replica setups |
| Date: | 2025-12-28 23:58:03 |
| Message-ID: | 20251229.085803.2270812288334644179.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
>> thank you! works for me. should we merge both into master or do you want me
>> to send a combined one?
>
>> do you want me to send a combined one?
>
> Yes, please send a combined one.
>
> I will do more tests and detailed code review.
Also when combing the patches, please correspond followings.
>>> > I also notice that something like:
>>> >
>>> > /* Count tokens in output for validation */
>>> > char *line_copy = pstrdup(line);
>>> > char *temp_token = strtok(line_copy, " \t\n");
>>> >
>>> > You should declare line_copy and temp_token in the begging of the code
>>> > block (or in the outer block). The forward declaration is recommended
>>> > coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
>>> > some other variables.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>> On Fri, Dec 26, 2025 at 12:03 PM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>>> Hi Nadav,
>>>
>>> I just want to make it clear. The patch should be applied on top of
>>> your latest.patch.
>>>
>>> > (Please disregard previous mail. I seem to have mangled the message).
>>> >
>>> > I think I found a cause of the problem. On Linux, if SIGCHLD is
>>> > ignored (set to SIG_IGN), waitpid() cannot get proper child status.
>>> > Because the kernel relcaims the resource for the child process to not
>>> > make the child process a zombie. And this makes waitpid() to fail with
>>> > ECHLD. Since the return of waitpid() is not checked, I did not notice
>>> > the waitpid() failure (I recommend to check the return value of
>>> > waitpid()).
>>> >
>>> > /* set up signal handlers */
>>> > signal(SIGALRM, SIG_DFL);
>>> > signal(SIGTERM, my_signal_handler);
>>> > signal(SIGINT, my_signal_handler);
>>> > signal(SIGHUP, reload_config_handler);
>>> > signal(SIGQUIT, my_signal_handler);
>>> > signal(SIGCHLD, SIG_IGN); <--- SIGCHLD is ignored
>>> > signal(SIGUSR1, my_signal_handler);
>>> > signal(SIGUSR2, SIG_IGN);
>>> > signal(SIGPIPE, SIG_IGN);
>>> >
>>> > To fix this, either change the line above to:
>>> >
>>> > signal(SIGCHLD, SIG_DFL);
>>> > or
>>> > signal(SIGCHLD, my_signal_handler);
>>> > and modify my_signal_handler.
>>> >
>>> > I recommend the latter, because it does not depend on the default
>>> > behavior of SIGCHLD, which might be different per platform.
>>> > Attached is the patch to do this. (and run pgindent).
>>> > I also notice that something like:
>>> >
>>> > /* Count tokens in output for validation */
>>> > char *line_copy = pstrdup(line);
>>> > char *temp_token = strtok(line_copy, " \t\n");
>>> >
>>> > You should declare line_copy and temp_token in the begging of the code
>>> > block (or in the outer block). The forward declaration is recommended
>>> > coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
>>> > some other variables.
>>> >
>>> > Best regards,
>>> > --
>>> > Tatsuo Ishii
>>> > SRA OSS K.K.
>>> > English: http://www.sraoss.co.jp/index_en/
>>> > Japanese:http://www.sraoss.co.jp
>>> >
>>> >> Hi Tatsuo,
>>> >>
>>> >> Thank you for the note.
>>> >>
>>> >> I've removed the docker stuff. started working in an ubuntu 24 VM to
>>> match
>>> >> the setup. hopefully the results will be better, had so many issues
>>> >> compiling and testing before that stuff wasn't properly formulated.
>>> >>
>>> >> Attaching the latest patch.
>>> >>
>>> >> this is what i'm seeing:
>>> >> adav(at)lima-dev:/src/pgpool2/src/test/regression$ PGHOST=/tmp
>>> ./regress.sh -p
>>> >> /usr/bin 041.external_replication_delay
>>> >> creating pgpool-II temporary installation ...
>>> >> moving pgpool_setup to temporary installation path ...
>>> >> moving watchdog_setup to temporary installation path ...
>>> >> using pgpool-II at /src/pgpool2/src/test/regression/temp/installed
>>> >> *************************
>>> >> REGRESSION MODE : install
>>> >> Pgpool-II version : pgpool-II version 4.8devel (mitsukakeboshi)
>>> >> Pgpool-II install path :
>>> /src/pgpool2/src/test/regression/temp/installed
>>> >> PostgreSQL bin : /usr/lib/postgresql/16/bin
>>> >> PostgreSQL Major version : 16
>>> >> pgbench : /usr/lib/postgresql/16/bin/pgbench
>>> >> PostgreSQL jdbc :
>>> >> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
>>> >> *************************
>>> >> testing 041.external_replication_delay...ok.
>>> >> out of 1 ok:1 failed:0 timeout:0
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Dec 23, 2025 at 10:46 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>>> wrote:
>>> >>
>>> >>> > Hi Tatsuo,
>>> >>> >
>>> >>> > I'km running into issues testing this and have created a full docker
>>> >>> > compose setup - can you please point me to up to date guides on the
>>> best
>>> >>> > way to run the tests so i know we're doing it the same way?
>>> >>> >
>>> >>> > Thank you for all your help!
>>> >>>
>>> >>> I have run the regression test on the Pgpool-II master branch on my
>>> >>> Ubuntu 24 box.
>>> >>>
>>> >>> cd pgpool2/src/test/regression
>>> >>> ./regress.sh 041
>>> >>>
>>> >>> This time I noticed:
>>> >>>
>>> >>> - The patch does not named with version number
>>> >>> - The patch creates .dockerignore and docker/ directory.
>>> >>>
>>> >>> Are they intended? I am asking because they are different from the
>>> >>> previous version.
>>> >>>
>>> >>> > On Tue, Dec 23, 2025 at 2:13 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>>> >>> wrote:
>>> >>> >
>>> >>> >> > I think everything is passing now. new version attached.
>>> >>> >>
>>> >>> >> Unfortunately Test1 did not pass.
>>> >>> >>
>>> >>> >> === Test1: Basic external command with integer millisecond values
>>> ===
>>> >>> >> waiting for server to start....1438600 2025-12-23 09:09:48.337 JST
>>> LOG:
>>> >>> >> redirecting log output to logging collector process
>>> >>> >> 1438600 2025-12-23 09:09:48.337 JST HINT: Future log output will
>>> appear
>>> >>> >> in directory "log".
>>> >>> >> done
>>> >>> >> server started
>>> >>> >> waiting for server to start....1438617 2025-12-23 09:09:48.443 JST
>>> LOG:
>>> >>> >> redirecting log output to logging collector process
>>> >>> >> 1438617 2025-12-23 09:09:48.443 JST HINT: Future log output will
>>> appear
>>> >>> >> in directory "log".
>>> >>> >> done
>>> >>> >> server started
>>> >>> >> waiting for server to start....1438634 2025-12-23 09:09:48.561 JST
>>> LOG:
>>> >>> >> redirecting log output to logging collector process
>>> >>> >> 1438634 2025-12-23 09:09:48.561 JST HINT: Future log output will
>>> appear
>>> >>> >> in directory "log".
>>> >>> >> done
>>> >>> >> server started
>>> >>> >> CREATE TABLE
>>> >>> >> Waiting for sr_check to run...
>>> >>> >> Command executed after 1 seconds
>>> >>> >> node_id | hostname | port | status | pg_status | lb_weight |
>>> role
>>> >>> |
>>> >>> >> pg_role | select_cnt | load_balance_node | replication_delay |
>>> >>> >> replication_state | replication_sync_state | last_status_change
>>> >>> >>
>>> >>> >>
>>> >>>
>>> ---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>> >>> >> 0 | localhost | 11002 | up | up | 0.333333 |
>>> primary
>>> >>> |
>>> >>> >> primary | 0 | true | 0 |
>>> >>> >> | | 2025-12-23 09:09:49
>>> >>> >> 1 | localhost | 11003 | up | up | 0.333333 |
>>> standby
>>> >>> |
>>> >>> >> standby | 0 | false | 0 |
>>> >>> >> | | 2025-12-23 09:09:49
>>> >>> >> 2 | localhost | 11004 | up | up | 0.333333 |
>>> standby
>>> >>> |
>>> >>> >> standby | 0 | false | 0 |
>>> >>> >> | | 2025-12-23 09:09:49
>>> >>> >> (3 rows)
>>> >>> >>
>>> >>> >> fail: external command delay logging not found
>>> >>> >>
>>> >>> >> > On Mon, Nov 24, 2025 at 9:41 AM Tatsuo Ishii <
>>> ishii(at)postgresql(dot)org>
>>> >>> >> wrote:
>>> >>> >> >
>>> >>> >> >> Thank you for updating the patch! This time the patch applies
>>> without
>>> >>> >> >> any issue and compiles fine. Unfortunately regression test
>>> failed.
>>> >>> >> >>
>>> >>> >> >> testing 041.external_replication_delay...failed.
>>> >>> >> >>
>>> >>> >> >> From the regression log, it seems Test7 failed.
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>> >> >> === Test7: Command timeout handling ===
>>> >>> >> >> waiting for server to start....411181 2025-11-24 16:31:05.244 JST
>>> >>> LOG:
>>> >>> >> >> redirecting log output to logging collector process
>>> >>> >> >> 411181 2025-11-24 16:31:05.244 JST HINT: Future log output will
>>> >>> appear
>>> >>> >> in
>>> >>> >> >> directory "log".
>>> >>> >> >> done
>>> >>> >> >> server started
>>> >>> >> >> waiting for server to start....411196 2025-11-24 16:31:05.352 JST
>>> >>> LOG:
>>> >>> >> >> redirecting log output to logging collector process
>>> >>> >> >> 411196 2025-11-24 16:31:05.352 JST HINT: Future log output will
>>> >>> appear
>>> >>> >> in
>>> >>> >> >> directory "log".
>>> >>> >> >> done
>>> >>> >> >> server started
>>> >>> >> >> waiting for server to start....411213 2025-11-24 16:31:05.461 JST
>>> >>> LOG:
>>> >>> >> >> redirecting log output to logging collector process
>>> >>> >> >> 411213 2025-11-24 16:31:05.461 JST HINT: Future log output will
>>> >>> appear
>>> >>> >> in
>>> >>> >> >> directory "log".
>>> >>> >> >> done
>>> >>> >> >> server started
>>> >>> >> >> Waiting for command timeout...
>>> >>> >> >> fail: command timeout not detected
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>> >> >>
>>> >>> >> >> Attached is the pgpool.log. If you need more info, please let me
>>> >>> know.
>>> >>> >> >>
>>> >>> >> >> Best regards,
>>> >>> >> >> --
>>> >>> >> >> Tatsuo Ishii
>>> >>> >> >> SRA OSS K.K.
>>> >>> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> >> Japanese:http://www.sraoss.co.jp
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> > Hi Tatsuo,
>>> >>> >> >> >
>>> >>> >> >> > Sorry again, this was due to the separation of 2 patches and i
>>> only
>>> >>> >> sent
>>> >>> >> >> > the one.
>>> >>> >> >> >
>>> >>> >> >> > I've merged it into 1 commit and 1 patch and rebased over
>>> master to
>>> >>> >> avoid
>>> >>> >> >> > these issues moving forward.
>>> >>> >> >> >
>>> >>> >> >> > PFA latest version
>>> >>> >> >> >
>>> >>> >> >> > On Thu, Nov 20, 2025 at 1:09 AM Tatsuo Ishii <
>>> ishii(at)postgresql(dot)org
>>> >>> >
>>> >>> >> >> wrote:
>>> >>> >> >> >
>>> >>> >> >> >> Hi Nadav,
>>> >>> >> >> >>
>>> >>> >> >> >> Thank you for new patch.
>>> >>> >> >> >> Unfortunately the patch did not apply to current master.
>>> >>> >> >> >>
>>> >>> >> >> >> $ git apply
>>> >>> >> >> >>
>>> ~/0001-Fix-multiple-issues-in-external-replication-delay-fe.patch
>>> >>> >> >> >> error: patch failed:
>>> >>> >> src/streaming_replication/pool_worker_child.c:694
>>> >>> >> >> >> error: src/streaming_replication/pool_worker_child.c: patch
>>> does
>>> >>> not
>>> >>> >> >> apply
>>> >>> >> >> >>
>>> >>> >> >> >> Maybe the patch is on top of your previous patch?
>>> >>> >> >> >>
>>> >>> >> >> >> Also I suggest to use "-v" option of "git format-patch" to
>>> add the
>>> >>> >> >> >> patch version number so that we can easily know which patch
>>> is the
>>> >>> >> >> >> latest.
>>> >>> >> >> >>
>>> >>> >> >> >> Best regards,
>>> >>> >> >> >> --
>>> >>> >> >> >> Tatsuo Ishii
>>> >>> >> >> >> SRA OSS K.K.
>>> >>> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >>> >> >> >>
>>> >>> >> >> >> > Hi Tatsuo,
>>> >>> >> >> >> >
>>> >>> >> >> >> > Please see attached an updated version.
>>> >>> >> >> >> >
>>> >>> >> >> >> > thank you
>>> >>> >> >> >> >
>>> >>> >> >> >> > On Fri, Nov 7, 2025 at 2:07 AM Tatsuo Ishii <
>>> >>> ishii(at)postgresql(dot)org>
>>> >>> >> >> >> wrote:
>>> >>> >> >> >> >
>>> >>> >> >> >> >> > Sorry for that - thanks for the patch.
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > Please find attached a new version
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Thanks for the new version. Unfortunately this time
>>> regression
>>> >>> >> test
>>> >>> >> >> >> >> fails at:
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> > Waiting for command timeout...
>>> >>> >> >> >> >> > fail: command timeout not detected
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Attached is the pgpool.log.
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Best regards,
>>> >>> >> >> >> >> --
>>> >>> >> >> >> >> Tatsuo Ishii
>>> >>> >> >> >> >> SRA OSS K.K.
>>> >>> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> > On Mon, Nov 3, 2025 at 9:05 AM Tatsuo Ishii <
>>> >>> >> ishii(at)postgresql(dot)org>
>>> >>> >> >> >> >> wrote:
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >> > thanks and sorry for the issues, please find attached
>>> >>> updated
>>> >>> >> >> >> version.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> No problem.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> This time the patch applies fine, no compiler warnings.
>>> >>> >> However,
>>> >>> >> >> >> >> >> regression test did not passed here (on Ubuntu 24 LTS if
>>> >>> this
>>> >>> >> >> >> >> >> matters). So I looked into
>>> >>> >> >> >> >> >>
>>> >>> >> src/test/regression/tests/041.external_replication_delay/test.sh a
>>> >>> >> >> >> >> >> little bit and apply attached patch (test.sh.patch). It
>>> >>> moved
>>> >>> >> >> forward
>>> >>> >> >> >> >> >> partially but failed at:
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> fail: command execution failure not detected
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> Please find attached
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> src/test/regression/tests/041.external_replication_delay/testdir/pgpool.log
>>> >>> >> >> >> >> >> and
>>> src/test/regression/log/041.external_replication_delay.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> Best regards,
>>> >>> >> >> >> >> >> --
>>> >>> >> >> >> >> >> Tatsuo Ishii
>>> >>> >> >> >> >> >> SRA OSS K.K.
>>> >>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > --
>>> >>> >> >> >> >> > Nadav Shatz
>>> >>> >> >> >> >> > Tailor Brands | CTO
>>> >>> >> >> >> >>
>>> >>> >> >> >> >
>>> >>> >> >> >> >
>>> >>> >> >> >> > --
>>> >>> >> >> >> > Nadav Shatz
>>> >>> >> >> >> > Tailor Brands | CTO
>>> >>> >> >> >>
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > --
>>> >>> >> >> > Nadav Shatz
>>> >>> >> >> > Tailor Brands | CTO
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > --
>>> >>> >> > Nadav Shatz
>>> >>> >> > Tailor Brands | CTO
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Nadav Shatz
>>> >>> > Tailor Brands | CTO
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Nadav Shatz
>>> >> Tailor Brands | CTO
>>>
>>
>>
>> --
>> Nadav Shatz
>> Tailor Brands | CTO
> <
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nadav Shatz | 2025-12-29 09:31:12 | Re: Proposal: recent access based routing for primary-replica setups |
| Previous Message | Tatsuo Ishii | 2025-12-28 23:48:44 | Re: Proposal: recent access based routing for primary-replica setups |