| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_upgrade: optimize replication slot caught-up check |
| Date: | 2026-01-07 21:08:51 |
| Message-ID: | CAD21AoCLo7-=dXCDH7cU2NQ5W0o=c1dNB5p+i5yAeHBQow-PPQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jan 7, 2026 at 12:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jan 6, 2026 at 12:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 5, 2026 at 6:55 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> > >
> > >
> > >
> > > > On Jan 6, 2026, at 02:02, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical
> > > > slots. Currently, to check if the slots are ready to be migrated, we
> > > > call binary_upgrade_logical_slot_has_caught_up() for every single
> > > > slot. This checks if there are any unconsumed WAL records. However, we
> > > > noticed a performance issue. If there are many slots (e.g., 100 or
> > > > more) or if there is a WAL backlog, checking all slots one by one
> > > > takes a long time.
> > > >
> > > > Here are some test results from my environment:
> > > > With an empty cluster: 1.55s
> > > > With 200 slots and 30MB backlog: 15.51s
> > > >
> > > > Commit 6d3d2e8e5 introduced parallel checks per database, but a single
> > > > job might still have to check too many slots, causing delays.
> > > >
> > > > Since binary_upgrade_logical_slot_has_caught_up() essentially checks
> > > > if any decodable record exists in the database, IIUC it is not
> > > > necessary to check every slot. We can optimize this by checking only
> > > > the slot with the minimum confirmed_flush_lsn. If that slot is caught
> > > > up, we can assume others are too. The attached patch implements this
> > > > optimization. With the patch, the test with 200 slots finished in
> > > > 2.512s. The execution time is now stable regardless of the number of
> > > > slots.
> > > >
> > > > One thing to note is that DecodeTXNNeedSkip() also considers
> > > > replication origin filters. Theoretically, a plugin could filter out
> > > > specific origins, which might lead to different results. However, this
> > > > is a very rare case. Even if it happens, it would just result in a
> > > > false positive (the upgrade fails safely), so the impact is minimal.
> > > > Therefore, the patch simplifies the check to be per-database instead
> > > > of per-slot.
> > > >
> > > > Feedback is very welcome.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Masahiko Sawada
> > > > Amazon Web Services: https://aws.amazon.com
> > > > <v1-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>
> > >
> > > Hi Masahiko-san,
> > >
> > > Basically I think the optimization idea makes sense to me. Reducing the check from O(slots × WAL) to O(databases × WAL) sounds a win. Just a few small comments:
> >
> > Thank you for reviewing the patch!
> >
> > >
> > > 1
> > > ```
> > > +static void process_old_cluter_logical_slot_catchup_infos(DbInfo *dbinfo, PGresult *res, void *arg);
> > > ```
> > >
> > > Typo in the function name: cluter->cluster
> > >
> > > 2
> > > ···
> > > - logical_slot_infos_query = get_old_cluster_logical_slot_infos_query();
> > > + const char *slot_info_query = "SELECT slot_name, plugin, two_phase, failover, “
> > > ···
> > >
> > > logical_slot_infos_query is no longer used, should be removed.
> >
> > Fixed.
> >
> > >
> > > 3
> > > ```
> > > + " ORDER BY confirmed_flush_lsn ASC "
> > > ```
> > >
> > > I am thinking, if confirmed_flush_lsn can be NULL? If that’s true, we may want to add “NULLS LAST”.
> >
> > Given that the query is not executed when live-check, I think
> > confirmed_flush cannot be NULL. temporary should also not be true but
> > it's for consistency with slot_info_query.
> >
> > >
> > > 4
> > > ```
> > > + Assert(num_slots == dbinfo->slot_arr.nslots);
> > > +
> > > + for (int i = 0; i < num_slots; i++)
> > > + {
> > > + char *slotname;
> > > + bool caught_up;
> > > +
> > > + slotname = PQgetvalue(res, i, PQfnumber(res, "slot_name"));
> > > + caught_up = (strcmp(PQgetvalue(res, i, PQfnumber(res, "caught_up")), "t") == 0);
> > > +
> > > + for (int slotnum = 0; slotnum < dbinfo->slot_arr.nslots; slotnum++)
> > > + {
> > > ```
> > >
> > > As num_slots == dbinfo->slot_arr.nslots, this is still a O(num_slots^2) check. Given this patch’s goal is to eliminate the burden from large number of slots, maybe we can build a hash table to make the check faster.
> > >
> >
> > Or if we sort both queries in the same order we can simply update the
> > slots sequentially.
> >
> > One problem I can see in this approach is that we could wrongly report
> > the invalid slots in invalid_logical_slots.txt. Even if the slot with
> > the oldest confirmed_flush_lsn has unconsumed decodable records, other
> > slots might have already been caught up. I'll think of how to deal
> > with this problem.
>
> I came up with a simple idea; instead of
> binary_upgrade_logical_slot_has_caught_up() returning true/false, we
> can have it return the last decodable WAL record's LSN, and consider
> logical slots as caught-up if its confirmed_flush_lsn already passed
> that LSN. This way, we can keep scanning at most one logical slot per
> database and reporting the same contents as today. The function nwo
> needs to scan all WAL backlogs but it's still much better than the
> current method.
>
> I've implemented this idea in the v2 patch with some updates:
>
> - incorporated the review comments.
> - added logic to support PG18 or older.
> - added regression tests.
>
> Feedback is very welcome.
>
I've fixed the test failures reported by CI.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch | application/octet-stream | 16.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2026-01-07 21:22:04 | Re: Support allocating memory for large strings |
| Previous Message | Kirill Reshke | 2026-01-07 20:57:56 | Re: GIN pageinspect support for entry tree and posting tree |