Re: Improve eviction algorithm in ReorderBuffer

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-02-28 06:09:55
Message-ID: CAA4eK1+2zRU4+NAoUAiYYW1peGrE3JjSx9SrqeTGM=8r=0-UoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 26, 2024 at 7:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>

A few comments on 0003:
===================
1.
+/*
+ * Threshold of the total number of top-level and sub transactions
that controls
+ * whether we switch the memory track state. While the MAINTAIN_HEAP state is
+ * effective when there are many transactions being decoded, in many systems
+ * there is generally no need to use it as long as all transactions
being decoded
+ * are top-level transactions. Therefore, we use MaxConnections as
the threshold
+ * so we can prevent switch to the state unless we use subtransactions.
+ */
+#define REORDER_BUFFER_MEM_TRACK_THRESHOLD MaxConnections

The comment seems to imply that MAINTAIN_HEAP is useful for large
number of transactions but ReorderBufferLargestTXN() switches to this
state even when there is one transaction. So, basically we use the
binary_heap technique to get the largest even when we have one
transaction but we don't maintain that heap unless we have
REORDER_BUFFER_MEM_TRACK_THRESHOLD number of transactions are
in-progress. This means there is some additional work when (build and
reset heap each time when we pick largest xact) we have fewer
transactions in the system but that may not be impacting us because of
other costs involved like serializing all the changes. I think once we
can try to stress test this by setting
debug_logical_replication_streaming to 'immediate' to see if the new
mechanism has any overhead.

2. Can we somehow measure the additional memory that will be consumed
by each backend/walsender to maintain transactions? Because I think
this also won't be accounted for logical_decoding_work_mem, so if this
is large, there could be a chance of more complaints on us for not
honoring logical_decoding_work_mem.

3.
@@ -3707,11 +3817,14 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)

ReorderBufferSerializeChange(rb, txn, fd, change);
dlist_delete(&change->node);
- ReorderBufferReturnChange(rb, change, true);
+ ReorderBufferReturnChange(rb, change, false);

spilled++;
}

+ /* Update the memory counter */
+ ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);

In ReorderBufferSerializeTXN(), we already use a size variable for
txn->size, we can probably use that for the sake of consistency.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-02-28 06:18:32 Re: RFC: Logging plan of the running query
Previous Message Andrey M. Borodin 2024-02-28 05:27:52 Re: Injection points: some tools to wait and wake