Re: Logical Replica ReorderBuffer Size Accounting Issues

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alex Richman <alexrichman(at)onesignal(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Logical Replica ReorderBuffer Size Accounting Issues
Date: 2023-01-06 04:03:42
Message-ID: CAA4eK1JjxNFGkDHLSSecWWD3nP+1KE4M=4G-AX2D2S+K_=m09w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jan 5, 2023 at 5:27 PM Alex Richman <alexrichman(at)onesignal(dot)com> wrote:
>
> We've noticed an odd memory issue with walsenders for logical replication slots - They experience large spikes in memory usage up to ~10x over the baseline from ~500MiB to ~5GiB, exceeding the configured logical_decoding_work_mem. Since we have ~40 active subscriptions this produces a spike of ~200GiB on the sender, which is quite worrying.
>
> The spikes in memory always slowly ramp up to ~5GB over ~10 minutes, then quickly drop back down to the ~500MB baseline.
>
> logical_decoding_work_mem is configured to 256MB, and streaming is configured on the subscription side, so I would expect the slots to either stream to spill bytes to disk when they get to the 256MB limit, and not get close to 5GiB. However pg_stat_replication_slots shows 0 spilled or streamed bytes for any slots.
>
>
> I used GDB to call MemoryContextStats on a walsender process with 5GB usage, which logged this large reorderbuffer context:
> --- snip ---
> ReorderBuffer: 65536 total in 4 blocks; 64624 free (169 chunks); 912 used
> ReorderBufferByXid: 32768 total in 3 blocks; 12600 free (6 chunks); 20168 used
> Tuples: 4311744512 total in 514 blocks (12858943 chunks); 6771224 free (12855411 chunks); 4304973288 used
> TXN: 16944 total in 2 blocks; 13984 free (46 chunks); 2960 used
> Change: 574944 total in 70 blocks; 214944 free (2239 chunks); 360000 used
> --- snip ---
>
>
> It's my understanding that the reorder buffer context is the thing that logical_decoding_work_mem specifically constraints, so it's surprising to see that it's holding onto ~4GB of tuples instead of spooling them. I found the code for that here: https://github.com/postgres/postgres/blob/eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8/src/backend/replication/logical/reorderbuffer.c#L3557 which suggests it's checking rb->size against the configured work_mem.
>
> I then used GDB to break into a high memory walsender and grab rb->size, which was only 73944. So it looks like the tuple memory isn't being properly accounted for in the total reorderbuffer size, so nothing is getting streamed/spooled?
>

One possible reason for this difference is that the memory allocated
to decode the tuple from WAL in the function
ReorderBufferGetTupleBuf() is different from the actual memory
required/accounted for the tuple in the function
ReorderBufferChangeSize(). Do you have any sample data to confirm
this? If you can't share sample data, can you let us know the average
tuple size?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2023-01-06 07:11:10 Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1
Previous Message Andres Freund 2023-01-06 00:01:08 Re: BUG #17737: An assert failed in execExprInterp.c