|From:||Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>|
|To:||Julien Rouhaud <rjuju123(at)gmail(dot)com>|
|Subject:||Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Thursday, December 26, 2019 8:18:46 PM CET Julien Rouhaud wrote:
> Hello Pierre,
> On Thu, Dec 26, 2019 at 5:43 PM Pierre Ducroquet <p(dot)psql(at)pinaraf(dot)info>
> > The second one was tested on PG 10 and PG 12 (with 48 lines offset). It
> > has on PG12 the same effect it has on a PG10+isAlive patch. Instead of
> > calling each time GetFlushRecPtr, we call it only if we notice we have
> > reached the value of the previous call. This way, when the senders are
> > busy decoding, we are no longer fighting for a spinlock to read the
> > FlushRecPtr.
> The patch is quite straightforward and looks good to me.
> - XLogRecPtr flushPtr;
> + static XLogRecPtr flushPtr = 0;
> You should use InvalidXLogRecPtr instead though, and maybe adding some
> comments to explain why the static variable is a life changer here.
> > Here are some benchmark results.
> > On PG 10, to decode our replication stream, we went from 3m 43s to over 5
> > minutes after removing the first hot spot, and then down to 22 seconds.
> > On PG 12, we had to change the benchmark (due to GIN indexes creation
> > being
> > more optimized) so we can not compare directly with our previous bench. We
> > went from 15m 11s down to 59 seconds.
> > If needed, we can provide scripts to reproduce this situation. It is quite
> > simple: add ~20 walsenders doing logical replication in database A, and
> > then generate a lot of data in database B. The walsenders will be woken
> > up by the activity on database B, but not sending it thus keeping hitting
> > the same locks.
> Quite impressive speedup!
Thank you for your comments.
Attached to this email is a patch with better comments regarding the
We've spent quite some time yesterday benching it again, this time with
changes that must be fully processed by the decoder. The speed-up is obviously
much smaller, we are only ~5% faster than without the patch.
|Next Message||John Naylor||2019-12-28 14:50:24||Re: use CLZ instruction in AllocSetFreeIndex()|
|Previous Message||Julien Rouhaud||2019-12-28 11:05:50||Re: Implementing Incremental View Maintenance|