| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | nadav(at)tailorbrands(dot)com |
| Cc: | pgpool-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Proposal: recent access based routing for primary-replica setups |
| Date: | 2025-09-03 23:36:15 |
| Message-ID: | 20250904.083615.1569744760401897215.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-hackers |
Hi,
> Hi,
>
> All good.
>
> The usual way the Pgpool accesses the lag params is through the relevant
> tables in the DB. for aurora that isn't available.
> The numbers are available directly from AWS API calls tho. This solution
> will work with Aurora by circumventing this issue.
>
> What i mentioned as a concern is that since the command doesn't currently
> accept the actual DB instance list (primary/replicas) and their order it
> can't guarantee it'll return the lag values in the expected order.
>
> Except the primary being the first, how will the running command know the
> order in which pgpool has loaded the replicas into it's
> memory/configuration?
>
> Hope this makes more sense - if not let me know and i'll provide some
> examples.
Yes, examples would be helpful.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> Thanks,
>
> On Tue, Sep 2, 2025 at 1:41 AM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
>> Hi Nadav,
>>
>> Sorry for late reply.
>>
>> >> I haven’t tried it yet but the whole premise of having it run a command
>> is
>> >> that it’s not dependent on the specific DB. As you mentioned earlier.
>> >>
>> >> The issue blocking the regular lag extraction from aurora is that it
>> >> doesn’t update the tables in the DB. It does have a CloudWatch API to
>> get
>> >> the numbers tho.
>>
>> I am not familiar with CloudWatch API and am not sure I fully
>> understand you issue. What is your issue with CloudWatch API? Is it a
>> technical problem, or some cost issue (I guess CloudWatch is a paid
>> service)?
>>
>> >> Ordering here could get tricky since we couple the command with the
>> >> instance order.
>> >>
>> >> Maybe we can expand the command to receive some arguments as to instance
>> >> order.
>>
>> Can you elaborate what "ordering" is?
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi Tatsuo,
>> >
>> > I don't want to rush at all - did you get a chance to look at what I
>> sent?
>> > Can I share more relevant information with you?
>> >
>> > What do you think?
>> >
>> > On Tue, Aug 26, 2025 at 9:54 AM Nadav Shatz <nadav(at)tailorbrands(dot)com>
>> wrote:
>> >
>> >> Hi Tatsuo,
>> >>
>> >> I haven’t tried it yet but the whole premise of having it run a command
>> is
>> >> that it’s not dependent on the specific DB. As you mentioned earlier.
>> >>
>> >> The issue blocking the regular lag extraction from aurora is that it
>> >> doesn’t update the tables in the DB. It does have a CloudWatch API to
>> get
>> >> the numbers tho.
>> >>
>> >> You can see the metric AuroraReplicaLag under
>> >>
>> >>
>> https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/metrics-reference.html
>> >>
>> >> So if we have a simple command to either get it or have something else
>> >> update a file with the numbers based on it we’ll be fine.
>> >>
>> >> Ordering here could get tricky since we couple the command with the
>> >> instance order.
>> >>
>> >> Maybe we can expand the command to receive some arguments as to instance
>> >> order.
>> >>
>> >> What do you think?
>> >>
>> >>
>> >> Nadav Shatz
>> >> Tailor Brands | CTO
>> >>
>> >>
>> >> On Tue, Aug 26, 2025 at 4:42 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>> wrote:
>> >>
>> >>> Hi Nadav,
>> >>>
>> >>> Thank you for updating the patch. I will look into that.
>> >>>
>> >>> I have a question. Have you actually tried the patch with AWS Aurora?
>> >>> I am wondering how patched pgpool works with Aurora. I am asking
>> >>> because in the doc "8.5. Aurora Configuration Example":
>> >>>
>> >>> Set sr_check_period to 0 to disable streaming replication delay
>> >>> checking. This is because Aurora does not provide necessary functions
>> >>> to check the replication delay.
>> >>>
>> >>> sr_check_period = 0
>> >>>
>> >>> So streaming replication checking is disabled, and it means that your
>> >>> patch is also effectively disabled too.
>> >>>
>> >>> Best regards,
>> >>> --
>> >>> Tatsuo Ishii
>> >>> SRA OSS K.K.
>> >>> English: http://www.sraoss.co.jp/index_en/
>> >>> Japanese:http://www.sraoss.co.jp
>> >>>
>> >>> > Hi Tatsuo,
>> >>> >
>> >>> > Thank you for the notes - please find attached an updated version.
>> >>> >
>> >>> > What do you think?
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > On Mon, Aug 25, 2025 at 5:18 AM Tatsuo Ishii <ishii(at)postgresql(dot)org>
>> >>> wrote:
>> >>> >
>> >>> >> Hi Nadav,
>> >>> >>
>> >>> >> Thank you for the patch!
>> >>> >>
>> >>> >> I have one question. How do you provide a password
>> (sr_check_password)
>> >>> >> while executing replication_delay_source_cmd as sr_check_user? In my
>> >>> >> understanding replication_delay_source_cmd is executed through su
>> >>> >> command in your patch. In this case su command tries to read the
>> >>> >> password from terminal. I don't see such a code in the patch.
>> >>> >>
>> >>> >> BTW, I start to think that executing replication_delay_source_cmd as
>> >>> >> sr_check_user might not be a good idea. sr_check_user is a database
>> >>> >> user, not OS user. In PostgreSQL they are not necessarily the
>> >>> >> same. Also doing su in pgpool process needs to be very carefully to
>> >>> >> avoid vulnerability. Probably we just execute it as pgpool OS user?
>> >>> >>
>> >>> >> Lastly when I apply the patches using git apply, there are some
>> >>> >> trailing space errors.
>> >>> >>
>> >>> >> $ git apply ~/external-lag-feature-implementation.patch
>> >>> >> /home/t-ishii/external-lag-feature-implementation.patch:314:
>> trailing
>> >>> >> whitespace.
>> >>> >>
>> >>> >> /home/t-ishii/external-lag-feature-implementation.patch:317:
>> trailing
>> >>> >> whitespace.
>> >>> >>
>> >>> >> /home/t-ishii/external-lag-feature-implementation.patch:318:
>> trailing
>> >>> >> whitespace.
>> >>> >> cmd_len = strlen(escaped_cmd) +
>> >>> >> /home/t-ishii/external-lag-feature-implementation.patch:320:
>> trailing
>> >>> >> whitespace.
>> >>> >>
>> >>> >> /home/t-ishii/external-lag-feature-implementation.patch:322:
>> trailing
>> >>> >> whitespace.
>> >>> >> snprintf(full_command, cmd_len, "su - %s -c
>> >>> '%s'",
>> >>> >> warning: squelched 4 whitespace errors
>> >>> >> warning: 9 lines add whitespace errors.
>> >>> >>
>> >>> >> $ git apply ~/external-lag-feature-tests.patch
>> >>> >> /home/t-ishii/external-lag-feature-tests.patch:87: trailing
>> whitespace.
>> >>> >> - test_parsing.sh: Unit test for parsing logic
>> >>> >> /home/t-ishii/external-lag-feature-tests.patch:440: trailing
>> >>> whitespace.
>> >>> >> # Test 2: Float values
>> >>> >> warning: 2 lines add whitespace errors.
>> >>> >>
>> >>> >> Also I have some compilation errors after patching the source
>> >>> >> code. See attached compilation log.
>> >>> >>
>> >>> >> Best regards,
>> >>> >> --
>> >>> >> Tatsuo Ishii
>> >>> >> SRA OSS K.K.
>> >>> >> English: http://www.sraoss.co.jp/index_en/
>> >>> >> Japanese:http://www.sraoss.co.jp
>> >>> >>
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Nadav Shatz
>> >>> > Tailor Brands | CTO
>> >>>
>> >>
>> >
>> > --
>> > Nadav Shatz
>> > Tailor Brands | CTO
>>
>
>
> --
> Nadav Shatz
> Tailor Brands | CTO
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nadav Shatz | 2025-09-07 08:52:25 | Re: Proposal: recent access based routing for primary-replica setups |
| Previous Message | Nadav Shatz | 2025-09-02 08:32:18 | Re: Proposal: recent access based routing for primary-replica setups |