Re: Proposal: recent access based routing for primary-replica setups

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: nadav(at)tailorbrands(dot)com
Cc: pgpool-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal: recent access based routing for primary-replica setups
Date: 2025-09-08 12:02:52
Message-ID: 20250908.210252.248806675014934998.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgpool-hackers

Hi Nadav,

Wow, that's quick!
I will look into the patch tomorrow.

> Hi Tatsuo,
>
> Please find attached the 3 patch files (implementation, tests, docs) with
> the updates we discussed.
>
> What do you think?
>
> Best,
>
> On Mon, Sep 8, 2025 at 3:26 AM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
>> Hi Nadav,
>>
>> > Hi Tatsuo,
>> >
>> > Thanks for getting back to me. Let me clarify the ordering concern and
>> > provide an example to make it clearer:
>> >
>> > Currently, replication_delay_source_cmd executes without awareness of the
>> > replica list or the order in which Pgpool loads them. For Aurora, since
>> > we’re bypassing the internal DB tables and fetching lag data directly via
>> > the AWS CloudWatch API, we need to ensure the returned lag values are
>> > mapped to the correct instances.
>> >
>> > For example, assume Pgpool has the following configuration:
>> >
>> > primary: db-primary
>> > replicas: db-replica-a, db-replica-b, db-replica-c
>> >
>> > If the command retrieves lag values [15, 120, 60] from CloudWatch, we
>> need
>> > to guarantee these are consistently mapped as:
>> >
>> >
>> > -
>> >
>> > db-replica-a → 15ms
>> > -
>> >
>> > db-replica-b → 120ms
>> > -
>> >
>> > db-replica-c → 60ms
>> >
>> > Without explicitly passing the instance identifiers and their order to
>> the
>> > command, there’s a risk that mismatched ordering will cause Pgpool to
>> make
>> > incorrect routing decisions.
>> >
>> > To address this, I suggest extending replication_delay_source_cmd to
>> accept
>> > an ordered list of instance identifiers as arguments. This way, the
>> command
>> > can fetch the metrics in the same sequence Pgpool expects, ensuring
>> > alignment between configuration and returned data.
>>
>> Thanks for the clarification. Previously I misunderstood that Aurora
>> only provides "reader endpoint", which made me think your proposal to
>> be impossible. But after some research , I found that Aurora also
>> provides "cluster endpoint" which refers to each replica instance. So
>> let me check if my understanding is
>> correct. replication_delay_source_cmd will be invoked as:
>>
>> replication_delay_source_cmd db-replica-a db-replica-b db-replica-c
>>
>> > Would you agree this approach makes sense?
>>
>> Yes.
>>
>> > If so, I can provide an updated
>> > patch to demonstrate how the command would handle ordered instance
>> mapping.
>>
>> Thanks. That would be good.
>>
>> BTW, There are minor points regarding your previous patch. In the patch
>>
>> 083.external_replication_delay/
>>
>> is the test directory. This does not fit in with our test
>> infrastructure tradition. Tests for new features should be added
>> between 001 and 049. 050 and greater are reserved for tests for bug
>> fixes. So at this point, 041 is appropreate (if other test for a new
>> feature is added before your patch is committed, you need to adjust
>> the number of course).
>>
>> You need to include a patch for documentation. You don't need to write
>> Japanese doc (doc.ja). We will create it from the English document
>> later on.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
>
>
> --
> Nadav Shatz
> Tailor Brands | CTO

In response to

Browse pgpool-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2025-09-09 00:39:44 Re: Proposal: recent access based routing for primary-replica setups
Previous Message Nadav Shatz 2025-09-08 09:50:16 Re: Proposal: recent access based routing for primary-replica setups