| From: | BharatDB <bharatdbpg(at)gmail(dot)com> |
|---|---|
| To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
| Subject: | Re: Checkpointer write combining |
| Date: | 2025-11-12 11:40:28 |
| Message-ID: | CAAh00EQy3KsgT33SRDOndUsMveDbLGXgdCZk9AoC8tYiqHXk4w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi all,
Considering the CI failures in earlier patch versions around “max batch
size”, upon my observation I found the failures arise either from boundary
conditions when io_combine_limit (GUC) is set larger than the compile-time
MAX_IO_COMBINE_LIMIT or when pin limits return small/zero values due to
which it produce out-of-range batch sizes or assertion failures in CI.
Comparing the approaches suggested in the thread, I think implementing (GUC
+ compile-time cap first, and then pin limits) could be more effective in
avoiding CI failures and also we should consider the following logic
conditions:
1.
Set io_combine_limit == 0 explicitly (fallback to 1 for forward
progress).
2.
Cap early to a conservative compile_cap (MAX_IO_COMBINE_LIMIT - 1) to
avoid array overflow. Otherwise if we confirm all slots are usable,
change to MAX_IO_COMBINE_LIMIT.
3.
Apply per-strategy pin and global pin limits only if they are positive.
4.
Use explicit typed comparisons to avoid signed/unsigned pitfalls and add
a final Assert() to capture assumptions in CI.
*Implementation logic:*
uint32
StrategyMaxWriteBatchSize(BufferAccessStrategy strategy)
{
uint32 max_write_batch_size;
uint32 compile_cap = MAX_IO_COMBINE_LIMIT - 1; /*
conservative cap */
int strategy_pin_limit = GetAccessStrategyPinLimit(strategy);
uint32 max_possible_buffer_limit = GetPinLimit();
max_write_batch_size = (io_combine_limit == 0) ? 1 : io_combine_limit;
if (max_write_batch_size > compile_cap)
max_write_batch_size = compile_cap;
if (strategy_pin_limit > 0 &&
(uint32) strategy_pin_limit < max_write_batch_size)
max_write_batch_size = (uint32) strategy_pin_limit;
if (max_possible_buffer_limit > 0 &&
max_possible_buffer_limit < max_write_batch_size)
max_write_batch_size = max_possible_buffer_limit;
if (max_write_batch_size == 0)
max_write_batch_size = 1;
Assert(max_write_batch_size <= compile_cap);
return max_write_batch_size;
}
I hope this will be helpful for proceeding further. Looking forward to
more feedback.
Thanking you.
Regards,
Soumya
On Tue, Nov 4, 2025 at 5:04 AM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:
> Thanks for continuing to review! I've revised the patches to
> incorporate all of your feedback except for where I mention below.
>
> There were failures in CI due to issues with max batch size, so
> attached v8 also seeks to fix those.
>
> - Melanie
>
> On Thu, Oct 16, 2025 at 12:25 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> >
> > 3 - 0003
> > ```
> > +/*
> > + * Return the next buffer in the ring or InvalidBuffer if the current
> sweep is
> > + * over.
> > + */
> > +Buffer
> > +StrategySweepNextBuffer(BufferAccessStrategy strategy, int
> *sweep_cursor)
> > +{
> > + if (++(*sweep_cursor) >= strategy->nbuffers)
> > + *sweep_cursor = 0;
> > +
> > + return strategy->buffers[*sweep_cursor];
> > +}
> > ```
> >
> > Feels the function comment is a bit confusing, because the function code
> doesn’t really perform sweep, the function is just a getter. InvalidBuffer
> just implies the current sweep is over.
> >
> > Maybe rephrase to something like: “Return the next buffer in the range.
> If InvalidBuffer is returned, that implies the current sweep is done."
>
> Yes, actually I think having these helpers mention the sweep is more
> confusing than anything else. I've revised them to be named more
> generically and updated the comments accordingly.
>
> > 5 - 0004
> > ```
> > +uint32
> > +StrategyMaxWriteBatchSize(BufferAccessStrategy strategy)
> > +{
> > + uint32 max_possible_buffer_limit;
> > + uint32 max_write_batch_size;
> > + int strategy_pin_limit;
> > +
> > + max_write_batch_size = io_combine_limit;
> > +
> > + strategy_pin_limit = GetAccessStrategyPinLimit(strategy);
> > + max_possible_buffer_limit = GetPinLimit();
> > +
> > + max_write_batch_size = Min(strategy_pin_limit,
> max_write_batch_size);
> > + max_write_batch_size = Min(max_possible_buffer_limit,
> max_write_batch_size);
> > + max_write_batch_size = Max(1, max_write_batch_size);
> > + max_write_batch_size = Min(max_write_batch_size,
> io_combine_limit);
> > + Assert(max_write_batch_size < MAX_IO_COMBINE_LIMIT);
> > + return max_write_batch_size;
> > +}
> > ```
> >
> > This implementation is hard to understand. I tried to simplify it:
> > ```
> > uint32
> > StrategyMaxWriteBatchSize(BufferAccessStrategy strategy)
> > {
> > int strategy_pin_limit = GetAccessStrategyPinLimit(strategy);
> > uint32 max_write_batch_size = Min(GetPinLimit(),
> (uint32)strategy_pin_limit);
> >
> > /* Clamp to io_combine_limit and enforce minimum of 1 */
> > if (max_write_batch_size > io_combine_limit)
> > max_write_batch_size = io_combine_limit;
> > if (max_write_batch_size == 0)
> > max_write_batch_size = 1;
> >
> > Assert(max_write_batch_size < MAX_IO_COMBINE_LIMIT);
> > return max_write_batch_size;
> > }
> > ```
>
> I agree that the implementation was hard to understand. I've not quite
> gone with your version but I have rewritten it like this:
>
> uint32
> StrategyMaxWriteBatchSize(BufferAccessStrategy strategy)
> {
> uint32 max_write_batch_size = Min(io_combine_limit,
> MAX_IO_COMBINE_LIMIT);
> int strategy_pin_limit =
> GetAccessStrategyPinLimit(strategy);
> uint32 max_possible_buffer_limit = GetPinLimit();
>
> /* Identify the minimum of the above */
> max_write_batch_size = Min(strategy_pin_limit, max_write_batch_size);
> max_write_batch_size = Min(max_possible_buffer_limit,
> max_write_batch_size);
>
> /* Must allow at least 1 IO for forward progress */
> max_write_batch_size = Max(1, max_write_batch_size);
>
> return max_write_batch_size;
> }
>
> Is this better?
>
> - Melanie
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2025-11-12 11:42:20 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Previous Message | Peter Eisentraut | 2025-11-12 11:39:19 | alignas (C11) |