Re: Proposal: recent access based routing for primary-replica setups

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: nadav(at)tailorbrands(dot)com
Cc: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal: recent access based routing for primary-replica setups
Date: 2025-12-26 10:03:49
Message-ID: 20251226.190349.1321105398044197651.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi Nadav,

I just want to make it clear. The patch should be applied on top of
your latest.patch.

> (Please disregard previous mail. I seem to have mangled the message).
>
> I think I found a cause of the problem. On Linux, if SIGCHLD is
> ignored (set to SIG_IGN), waitpid() cannot get proper child status.
> Because the kernel relcaims the resource for the child process to not
> make the child process a zombie. And this makes waitpid() to fail with
> ECHLD. Since the return of waitpid() is not checked, I did not notice
> the waitpid() failure (I recommend to check the return value of
> waitpid()).
>
> /* set up signal handlers */
> signal(SIGALRM, SIG_DFL);
> signal(SIGTERM, my_signal_handler);
> signal(SIGINT, my_signal_handler);
> signal(SIGHUP, reload_config_handler);
> signal(SIGQUIT, my_signal_handler);
> signal(SIGCHLD, SIG_IGN); <--- SIGCHLD is ignored
> signal(SIGUSR1, my_signal_handler);
> signal(SIGUSR2, SIG_IGN);
> signal(SIGPIPE, SIG_IGN);
>
> To fix this, either change the line above to:
>
> signal(SIGCHLD, SIG_DFL);
> or
> signal(SIGCHLD, my_signal_handler);
> and modify my_signal_handler.
>
> I recommend the latter, because it does not depend on the default
> behavior of SIGCHLD, which might be different per platform.
> Attached is the patch to do this. (and run pgindent).
> I also notice that something like:
>
> /* Count tokens in output for validation */
> char *line_copy = pstrdup(line);
> char *temp_token = strtok(line_copy, " \t\n");
>
> You should declare line_copy and temp_token in the begging of the code
> block (or in the outer block). The forward declaration is recommended
> coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
> some other variables.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
>> Hi Tatsuo,
>>
>> Thank you for the note.
>>
>> I've removed the docker stuff. started working in an ubuntu 24 VM to match
>> the setup. hopefully the results will be better, had so many issues
>> compiling and testing before that stuff wasn't properly formulated.
>>
>> Attaching the latest patch.
>>
>> this is what i'm seeing:
>> adav(at)lima-dev:/src/pgpool2/src/test/regression$ PGHOST=/tmp ./regress.sh -p
>> /usr/bin 041.external_replication_delay
>> creating pgpool-II temporary installation ...
>> moving pgpool_setup to temporary installation path ...
>> moving watchdog_setup to temporary installation path ...
>> using pgpool-II at /src/pgpool2/src/test/regression/temp/installed
>> *************************
>> REGRESSION MODE : install
>> Pgpool-II version : pgpool-II version 4.8devel (mitsukakeboshi)
>> Pgpool-II install path : /src/pgpool2/src/test/regression/temp/installed
>> PostgreSQL bin : /usr/lib/postgresql/16/bin
>> PostgreSQL Major version : 16
>> pgbench : /usr/lib/postgresql/16/bin/pgbench
>> PostgreSQL jdbc :
>> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
>> *************************
>> testing 041.external_replication_delay...ok.
>> out of 1 ok:1 failed:0 timeout:0
>>
>>
>>
>> On Tue, Dec 23, 2025 at 10:46 AM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>>> > Hi Tatsuo,
>>> >
>>> > I'km running into issues testing this and have created a full docker
>>> > compose setup - can you please point me to up to date guides on the best
>>> > way to run the tests so i know we're doing it the same way?
>>> >
>>> > Thank you for all your help!
>>>
>>> I have run the regression test on the Pgpool-II master branch on my
>>> Ubuntu 24 box.
>>>
>>> cd pgpool2/src/test/regression
>>> ./regress.sh 041
>>>
>>> This time I noticed:
>>>
>>> - The patch does not named with version number
>>> - The patch creates .dockerignore and docker/ directory.
>>>
>>> Are they intended? I am asking because they are different from the
>>> previous version.
>>>
>>> > On Tue, Dec 23, 2025 at 2:13 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>>> wrote:
>>> >
>>> >> > I think everything is passing now. new version attached.
>>> >>
>>> >> Unfortunately Test1 did not pass.
>>> >>
>>> >> === Test1: Basic external command with integer millisecond values ===
>>> >> waiting for server to start....1438600 2025-12-23 09:09:48.337 JST LOG:
>>> >> redirecting log output to logging collector process
>>> >> 1438600 2025-12-23 09:09:48.337 JST HINT: Future log output will appear
>>> >> in directory "log".
>>> >> done
>>> >> server started
>>> >> waiting for server to start....1438617 2025-12-23 09:09:48.443 JST LOG:
>>> >> redirecting log output to logging collector process
>>> >> 1438617 2025-12-23 09:09:48.443 JST HINT: Future log output will appear
>>> >> in directory "log".
>>> >> done
>>> >> server started
>>> >> waiting for server to start....1438634 2025-12-23 09:09:48.561 JST LOG:
>>> >> redirecting log output to logging collector process
>>> >> 1438634 2025-12-23 09:09:48.561 JST HINT: Future log output will appear
>>> >> in directory "log".
>>> >> done
>>> >> server started
>>> >> CREATE TABLE
>>> >> Waiting for sr_check to run...
>>> >> Command executed after 1 seconds
>>> >> node_id | hostname | port | status | pg_status | lb_weight | role
>>> |
>>> >> pg_role | select_cnt | load_balance_node | replication_delay |
>>> >> replication_state | replication_sync_state | last_status_change
>>> >>
>>> >>
>>> ---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>> >> 0 | localhost | 11002 | up | up | 0.333333 | primary
>>> |
>>> >> primary | 0 | true | 0 |
>>> >> | | 2025-12-23 09:09:49
>>> >> 1 | localhost | 11003 | up | up | 0.333333 | standby
>>> |
>>> >> standby | 0 | false | 0 |
>>> >> | | 2025-12-23 09:09:49
>>> >> 2 | localhost | 11004 | up | up | 0.333333 | standby
>>> |
>>> >> standby | 0 | false | 0 |
>>> >> | | 2025-12-23 09:09:49
>>> >> (3 rows)
>>> >>
>>> >> fail: external command delay logging not found
>>> >>
>>> >> > On Mon, Nov 24, 2025 at 9:41 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>>> >> wrote:
>>> >> >
>>> >> >> Thank you for updating the patch! This time the patch applies without
>>> >> >> any issue and compiles fine. Unfortunately regression test failed.
>>> >> >>
>>> >> >> testing 041.external_replication_delay...failed.
>>> >> >>
>>> >> >> From the regression log, it seems Test7 failed.
>>> >> >>
>>> >> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> >> === Test7: Command timeout handling ===
>>> >> >> waiting for server to start....411181 2025-11-24 16:31:05.244 JST
>>> LOG:
>>> >> >> redirecting log output to logging collector process
>>> >> >> 411181 2025-11-24 16:31:05.244 JST HINT: Future log output will
>>> appear
>>> >> in
>>> >> >> directory "log".
>>> >> >> done
>>> >> >> server started
>>> >> >> waiting for server to start....411196 2025-11-24 16:31:05.352 JST
>>> LOG:
>>> >> >> redirecting log output to logging collector process
>>> >> >> 411196 2025-11-24 16:31:05.352 JST HINT: Future log output will
>>> appear
>>> >> in
>>> >> >> directory "log".
>>> >> >> done
>>> >> >> server started
>>> >> >> waiting for server to start....411213 2025-11-24 16:31:05.461 JST
>>> LOG:
>>> >> >> redirecting log output to logging collector process
>>> >> >> 411213 2025-11-24 16:31:05.461 JST HINT: Future log output will
>>> appear
>>> >> in
>>> >> >> directory "log".
>>> >> >> done
>>> >> >> server started
>>> >> >> Waiting for command timeout...
>>> >> >> fail: command timeout not detected
>>> >> >>
>>> >> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> >>
>>> >> >> Attached is the pgpool.log. If you need more info, please let me
>>> know.
>>> >> >>
>>> >> >> Best regards,
>>> >> >> --
>>> >> >> Tatsuo Ishii
>>> >> >> SRA OSS K.K.
>>> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >>
>>> >> >>
>>> >> >> > Hi Tatsuo,
>>> >> >> >
>>> >> >> > Sorry again, this was due to the separation of 2 patches and i only
>>> >> sent
>>> >> >> > the one.
>>> >> >> >
>>> >> >> > I've merged it into 1 commit and 1 patch and rebased over master to
>>> >> avoid
>>> >> >> > these issues moving forward.
>>> >> >> >
>>> >> >> > PFA latest version
>>> >> >> >
>>> >> >> > On Thu, Nov 20, 2025 at 1:09 AM Tatsuo Ishii <ishii(at)postgresql(dot)org
>>> >
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Hi Nadav,
>>> >> >> >>
>>> >> >> >> Thank you for new patch.
>>> >> >> >> Unfortunately the patch did not apply to current master.
>>> >> >> >>
>>> >> >> >> $ git apply
>>> >> >> >> ~/0001-Fix-multiple-issues-in-external-replication-delay-fe.patch
>>> >> >> >> error: patch failed:
>>> >> src/streaming_replication/pool_worker_child.c:694
>>> >> >> >> error: src/streaming_replication/pool_worker_child.c: patch does
>>> not
>>> >> >> apply
>>> >> >> >>
>>> >> >> >> Maybe the patch is on top of your previous patch?
>>> >> >> >>
>>> >> >> >> Also I suggest to use "-v" option of "git format-patch" to add the
>>> >> >> >> patch version number so that we can easily know which patch is the
>>> >> >> >> latest.
>>> >> >> >>
>>> >> >> >> Best regards,
>>> >> >> >> --
>>> >> >> >> Tatsuo Ishii
>>> >> >> >> SRA OSS K.K.
>>> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >> >>
>>> >> >> >> > Hi Tatsuo,
>>> >> >> >> >
>>> >> >> >> > Please see attached an updated version.
>>> >> >> >> >
>>> >> >> >> > thank you
>>> >> >> >> >
>>> >> >> >> > On Fri, Nov 7, 2025 at 2:07 AM Tatsuo Ishii <
>>> ishii(at)postgresql(dot)org>
>>> >> >> >> wrote:
>>> >> >> >> >
>>> >> >> >> >> > Sorry for that - thanks for the patch.
>>> >> >> >> >> >
>>> >> >> >> >> > Please find attached a new version
>>> >> >> >> >>
>>> >> >> >> >> Thanks for the new version. Unfortunately this time regression
>>> >> test
>>> >> >> >> >> fails at:
>>> >> >> >> >>
>>> >> >> >> >> > Waiting for command timeout...
>>> >> >> >> >> > fail: command timeout not detected
>>> >> >> >> >>
>>> >> >> >> >> Attached is the pgpool.log.
>>> >> >> >> >>
>>> >> >> >> >> Best regards,
>>> >> >> >> >> --
>>> >> >> >> >> Tatsuo Ishii
>>> >> >> >> >> SRA OSS K.K.
>>> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >> >> >>
>>> >> >> >> >> > On Mon, Nov 3, 2025 at 9:05 AM Tatsuo Ishii <
>>> >> ishii(at)postgresql(dot)org>
>>> >> >> >> >> wrote:
>>> >> >> >> >> >
>>> >> >> >> >> >> > thanks and sorry for the issues, please find attached
>>> updated
>>> >> >> >> version.
>>> >> >> >> >> >>
>>> >> >> >> >> >> No problem.
>>> >> >> >> >> >>
>>> >> >> >> >> >> This time the patch applies fine, no compiler warnings.
>>> >> However,
>>> >> >> >> >> >> regression test did not passed here (on Ubuntu 24 LTS if
>>> this
>>> >> >> >> >> >> matters). So I looked into
>>> >> >> >> >> >>
>>> >> src/test/regression/tests/041.external_replication_delay/test.sh a
>>> >> >> >> >> >> little bit and apply attached patch (test.sh.patch). It
>>> moved
>>> >> >> forward
>>> >> >> >> >> >> partially but failed at:
>>> >> >> >> >> >>
>>> >> >> >> >> >> fail: command execution failure not detected
>>> >> >> >> >> >>
>>> >> >> >> >> >> Please find attached
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>> src/test/regression/tests/041.external_replication_delay/testdir/pgpool.log
>>> >> >> >> >> >> and src/test/regression/log/041.external_replication_delay.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Best regards,
>>> >> >> >> >> >> --
>>> >> >> >> >> >> Tatsuo Ishii
>>> >> >> >> >> >> SRA OSS K.K.
>>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >> >> >> >>
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > --
>>> >> >> >> >> > Nadav Shatz
>>> >> >> >> >> > Tailor Brands | CTO
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Nadav Shatz
>>> >> >> >> > Tailor Brands | CTO
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Nadav Shatz
>>> >> >> > Tailor Brands | CTO
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Nadav Shatz
>>> >> > Tailor Brands | CTO
>>> >>
>>> >
>>> >
>>> > --
>>> > Nadav Shatz
>>> > Tailor Brands | CTO
>>>
>>
>>
>> --
>> Nadav Shatz
>> Tailor Brands | CTO

In response to

Responses

Browse pgpool-hackers by date

  From Date Subject
Next Message Nadav Shatz 2025-12-28 12:21:15 Re: Proposal: recent access based routing for primary-replica setups
Previous Message Tatsuo Ishii 2025-12-26 07:54:46 Re: Proposal: recent access based routing for primary-replica setups