Re: row filtering for logical replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Peter Smith" <smithpb2250(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Rahila Syed" <rahilasyed90(at)gmail(dot)com>, "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-08-25 00:21:57
Message-ID: d65b0801-34bb-4db2-b0d4-8e39852c9f9d@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 24, 2021, at 4:46 AM, Peter Smith wrote:
> I have used debug logging to confirm that what Amit wrote [1] is
> correct; the row-filter ExprState of *every* table's row_filter will
> be invalidated (and so subsequently gets rebuilt) when the user
> changes the PUBLICATION tables. This was a side-effect of the
> rel_sync_cache_publication_cb which is freeing the cached ExprState
> and setting the entry->replicate_valid = false; for *every* entry.
>
> So yes, the ExprCache is getting rebuilt for some situations where it
> is not strictly necessary to do so.
I'm afraid we are overenginnering this feature. We already have a cache
mechanism that was suggested (that shows a small improvement). As you said the
gain for this new improvement is zero or minimal (it depends on your logical
replication setup/maintenance).

> 1. Although the ExprState cache is effective, in practice the
> performance improvement was not very much. My previous results [2]
> showed only about 2sec saving for 100K calls to the
> pgoutput_row_filter function. So I think eliminating just one or two
> unnecessary calls in the get_rel_sync_entry is going to make zero
> observable difference.
>
> 2. IMO it is safe to expect that the ALTER PUBLICATION is a rare
> operation relative to the number of times that pgoutput_row_filter
> will be called (the pgoutput_row_filter is quite a "hot" function
> since it is called for every INSERT/UPDATE/DELETE). It will be orders
> of magnitude difference 1:1000, 1:100000 etc.
>
> ~~
>
> Anyway, I have implemented the suggested cache change because I agree
> it is probably theoretically superior, even if in practice there is
> almost no difference.
I didn't inspect your patch carefully but it seems you add another List to
control this new cache mechanism. I don't like it. IMO if we can use the data
structures that we have now, let's implement your idea; otherwise, -1 for this
new micro optimization.

[By the way, it took some time to extract what you changed. Since we're trading
patches, I personally appreciate if you can send a patch on the top of the
current one. I have some changes too and it is time consuming incorporating
changes in the main patch.]

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-08-25 01:05:25 Re: Some leftovers of recent message cleanup?
Previous Message Michael Paquier 2021-08-25 00:21:53 Re: Tab completion for "create unlogged" a bit too lax?