Re: Improve eviction algorithm in ReorderBuffer

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve eviction algorithm in ReorderBuffer
Date: 2024-02-29 01:24:20
Message-ID: CAD21AoCH28+CJpek2dhSjgByn9=+h5YUWCMqOcwkaKMFKzh=oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 28, 2024 at 3:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Feb 26, 2024 at 7:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
>

Thank you for the comments!

> A few comments on 0003:
> ===================
> 1.
> +/*
> + * Threshold of the total number of top-level and sub transactions
> that controls
> + * whether we switch the memory track state. While the MAINTAIN_HEAP state is
> + * effective when there are many transactions being decoded, in many systems
> + * there is generally no need to use it as long as all transactions
> being decoded
> + * are top-level transactions. Therefore, we use MaxConnections as
> the threshold
> + * so we can prevent switch to the state unless we use subtransactions.
> + */
> +#define REORDER_BUFFER_MEM_TRACK_THRESHOLD MaxConnections
>
> The comment seems to imply that MAINTAIN_HEAP is useful for large
> number of transactions but ReorderBufferLargestTXN() switches to this
> state even when there is one transaction. So, basically we use the
> binary_heap technique to get the largest even when we have one
> transaction but we don't maintain that heap unless we have
> REORDER_BUFFER_MEM_TRACK_THRESHOLD number of transactions are
> in-progress.

Right.

> This means there is some additional work when (build and
> reset heap each time when we pick largest xact) we have fewer
> transactions in the system but that may not be impacting us because of
> other costs involved like serializing all the changes. I think once we
> can try to stress test this by setting
> debug_logical_replication_streaming to 'immediate' to see if the new
> mechanism has any overhead.

Agreed.

I've done performance tests that decodes 10k small transactions
(pgbench transactions) with debug_logical_replication_streaming =
'immediate':

master: 6263.022 ms
patched: 6403.873 ms

I don't see noticeable regressions.

>
> 2. Can we somehow measure the additional memory that will be consumed
> by each backend/walsender to maintain transactions? Because I think
> this also won't be accounted for logical_decoding_work_mem, so if this
> is large, there could be a chance of more complaints on us for not
> honoring logical_decoding_work_mem.

Good point.

We initialize the binaryheap with MaxConnections * 2 entries and the
binaryheap entries are pointers. So we use additional (8 * 100 * 2)
bytes with the default max_connections setting even when there is one
transaction, and could use more memory when adding more transactions.

I think there is still room for considering how to determine the
threshold and the number of initial entries. Using MaxConnections
seems to work but it always uses the current MaxConnections value
instead of the value that was set at a time when WAL records were
written. As for the initial number of entries in binaryheap, I think
we can the threshold value as the initial number of entries instead of
(threshold * 2). Or we might want to use the same value, 1000, as the
one we use for buffer->by_txn hash table.

>
> 3.
> @@ -3707,11 +3817,14 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
>
> ReorderBufferSerializeChange(rb, txn, fd, change);
> dlist_delete(&change->node);
> - ReorderBufferReturnChange(rb, change, true);
> + ReorderBufferReturnChange(rb, change, false);
>
> spilled++;
> }
>
> + /* Update the memory counter */
> + ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size);
>
> In ReorderBufferSerializeTXN(), we already use a size variable for
> txn->size, we can probably use that for the sake of consistency.

Agreed, will fix it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-02-29 01:34:31 RE: Synchronizing slots from primary to standby
Previous Message Michael Paquier 2024-02-29 00:56:55 Re: Improve readability by using designated initializers when possible