| From: | Nadav Shatz <nadav(at)tailorbrands(dot)com> | 
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> | 
| Cc: | pgpool-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: Proposal: recent access based routing for primary-replica setups | 
| Date: | 2025-08-24 11:11:32 | 
| Message-ID: | CACeKOO000N8gUodkXt_1V=XQYx_utxXdyOxkzV7GYSPpdHnTfg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgpool-hackers | 
Hi Tatsuo,
Here is an initial draft in 2 patches (one for code changes and one for
tests implementation).
Please let me know what you think.
Thank you,
On Thu, Aug 21, 2025 at 1:23 PM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> Hi Nadav,
>
> Thank you for understanding. Please don't hesitate to ask questions
> regarding Pgpool-II source code.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> > Hi Tatsuo,
> >
> > I'm fine with all of your comments and suggestions.
> >
> > I'll work on a patch and we can iterate over it.
> >
> > Hope that's okay.
> >
> > Best,
> >
> > On Thu, Aug 21, 2025 at 8:04 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> wrote:
> >
> >> Hi Nadav,
> >>
> >> > Hi Tatsuo,
> >> >
> >> > Thank you for your reply, I agree with your approach. Better to get
> (1)
> >> out
> >> > of the way first.
> >> >
> >> > As a simplest approach that we can implement that would support
> >> completely
> >> > offloading the responsibility of the lag checking we can set it to
> “file”
> >> > and add another config for file path. Or just if starts with “file:”
> >> it’ll
> >> > understand.
> >>
> >> My concern about the "file:" approach is, race condition. What if
> >> pgpool reads the file while it is being updated by someone else?  Also
> >> I think the command approach is more flexible and generic. For
> >> example, the "file approch" can be easily simulated by setting the
> >> command "/usr/bin/cat path_to_the_file".
> >>
> >> > Then the internal polling can just read the file on schedule. The
> entire
> >> > updating mechanism will be left to the external service.
> >>
> >> Internal polling is a little bit complicated and will not be easily
> >> changed to just reading a file. The internal polling has two options:
> >> one is checking WAL LSN difference, the other is replication delay in
> >> time. The file approch would only replace the latter. I suggest to
> >> leave the internal polling code as it is.
> >>
> >> > Having this as a first step also opens up the door for other
> >> > implementations.
> >> >
> >> > Another classic option would be calling an API endpoint. But that
> might
> >> > come with a lot more bulk and security concerns.
> >>
> >> I agree that calling API could bring security concerns.
> >>
> >> BTW, in the command approch, the command should be executed as
> >> sr_check_user.
> >>
> >> > I suggest I work on a patch for file support.
> >> >
> >> > What do you think?
> >>
> >> For the reason above I prefer the command approch, not the file
> >> support.
> >>
> >> > Nadav Shatz
> >> > Tailor Brands | CTO
> >> >
> >> >
> >> > On Wed, Aug 20, 2025 at 3:45 PM Tatsuo Ishii <ishii(at)postgresql(dot)org>
> >> wrote:
> >> >
> >> >> Hi Nadav,
> >> >>
> >> >> Thank you for the answer.
> >> >>
> >> >> I think your proposal actually includes two orthogonal proposals.
> >> >>
> >> >> (1) "inject" replication delay value from external source (in your
> >> >> case from Aurora).
> >> >>
> >> >> (2) per relation recent access based routing.
> >> >>
> >> >> I suggest to implement (1) first, then (2). This incremental approach
> >> >> would be easier than implementing (1)+(2) at once.
> >> >>
> >> >> For (1) we could add new pgpool.conf parameter, say
> >> >> "replication_delay_source". If it is set to "builtin", then
> >> >> replication delay source is PostgreSQL as we already does today. If
> >> >> it's set other than "builtin", then it's an external command name (+
> >> >> arguments) to be executed to import replication delay value. The
> >> >> command should return replication delay value represented in strings
> >> >> like "0 20 10", which means node 0, 1 and 2 replication delay values
> >> >> in millisecond (in this case since the node 0 is primary, its
> >> >> replication delay is 0). The command will be invoked every
> >> >> sr_check_period.
> >> >>
> >> >> I am not sure if this actually works in Aurora. This is just a quick
> >> >> idea.
> >> >>
> >> >> (2) would be probably much harder than (1). So we need more
> discussion
> >> >> later on.
> >> >>
> >> >> Best regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS K.K.
> >> >> English: http://www.sraoss.co.jp/index_en/
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >>
> >
> >
> > --
> > Nadav Shatz
> > Tailor Brands | CTO
>
-- 
Nadav Shatz
Tailor Brands | CTO
| Attachment | Content-Type | Size | 
|---|---|---|
| external-lag-feature-implementation.patch | application/octet-stream | 16.0 KB | 
| external-lag-feature-tests.patch | application/octet-stream | 25.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2025-08-25 02:18:25 | Re: Proposal: recent access based routing for primary-replica setups | 
| Previous Message | Bo Peng | 2025-08-22 04:41:55 | Re: Proposal: Restrict watchdog and heartbeat receiver to listen only on configured addresses |