RE: row filtering for logical replication

From: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2021-12-21 06:02:42
Message-ID: OS0PR01MB61134C83F754A53CD7B536D4FB7C9@OS0PR01MB6113.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, December 20, 2021 4:47 PM tanghy(dot)fnst(at)fujitsu(dot)com <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> On Monday, December 20, 2021 11:24 AM tanghy(dot)fnst(at)fujitsu(dot)com
> <tanghy(dot)fnst(at)fujitsu(dot)com>
> >
> > On Wednesday, December 8, 2021 2:29 PM Amit Kapila
> > <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> > > >
> > > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> > > >
> > > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> > > > >
> > > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > > > >
> > > > > PS> I will update the commit message in the next version. I barely changed
> > the
> > > > > documentation to reflect the current behavior. I probably missed some
> > > changes
> > > > > but I will fix in the next version.
> > > > >
> > > > > I realized that I forgot to mention a few things about the UPDATE behavior.
> > > > > Regardless of 0003, we need to define which tuple will be used to evaluate
> > the
> > > > > row filter for UPDATEs. We already discussed it circa [1]. This current
> version
> > > > > chooses *new* tuple. Is it the best choice?
> > > >
> > > > But with 0003, we are using both the tuple for evaluating the row
> > > > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > > > 0001? I mean eventually, 0003 is doing what is the agreed behavior,
> > > > i.e. if just OLD is matching the filter then convert the UPDATE to
> > > > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > > > to INSERT. Do you think that even we merge 0001 and 0003 then also
> > > > there is an open issue regarding which row to select for the filter?
> > > >
> > > > Maybe I was not clear. IIUC we are still discussing 0003 and I would like
> to
> > > > propose a different default based on the conclusion I came up. If we merged
> > > > 0003, that's fine; this change will be useless. If we don't or it is optional,
> > > > it still has its merit.
> > > >
> > > > Do we want to pay the overhead to evaluating both tuple for UPDATEs? I'm
> still
> > > > processing if it is worth it. If you think that in general the row filter
> > > > contains the primary key and it is rare to change it, it will waste cycles
> > > > evaluating the same expression twice. It seems this behavior could be
> > > > controlled by a parameter.
> > > >
> > >
> > > I think the first thing we should do in this regard is to evaluate the
> > > performance for both cases (when we apply a filter to both tuples vs.
> > > to one of the tuples). In case the performance difference is
> > > unacceptable, I think it would be better to still compare both tuples
> > > as default to avoid data inconsistency issues and have an option to
> > > allow comparing one of the tuples.
> > >
> >
> > I did some performance tests to see if 0003 patch has much overhead.
> > With which I compared applying first two patches and applying first three patches
> > in four cases:
> > 1) only old rows match the filter.
> > 2) only new rows match the filter.
> > 3) both old rows and new rows match the filter.
> > 4) neither old rows nor new rows match the filter.
> >
> > 0003 patch checks both old rows and new rows, and without 0003 patch, it only
> > checks either old or new rows. We want to know whether it would take more time
> > if we check the old rows.
> >
> > I ran the tests in asynchronous mode and compared the SQL execution time. I
> also
> > tried some complex filters, to see if the difference could be more obvious.
> >
> > The result and the script are attached.
> > I didn’t see big difference between the result of applying 0003 patch and the
> > one not in all cases. So I think 0003 patch doesn’t have much overhead.
> >
>
> In previous test, I ran 3 times and took the average value, which may be affected
> by
> performance fluctuations.
>
> So, to make the results more accurate, I tested them more times (10 times) and
> took the average value. The result is attached.
>
> In general, I can see the time difference is within 3.5%, which is in an reasonable
> performance range, I think.
>

Hi,

I ran tests for various percentages of rows being filtered (based on v49 patch). The result and the script are attached.

In synchronous mode, with row filter patch, the fewer rows match the row filter, the less time it took.
In the case that all rows match the filter, row filter patch took about the same time as the one on HEAD code.

In asynchronous mode, I could see time is reduced when the percentage of rows sent is small (<25%), other cases took about the same time as the one on HEAD.

I think the above result is good. It shows that row filter patch doesn’t have much overhead.

Regards,
Tang

Attachment Content-Type Size
Performance_reports.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 62.3 KB
perf_various_percentages.sh application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2021-12-21 06:12:50 RE: row filtering for logical replication
Previous Message Andrew Bille 2021-12-21 05:59:49 Re: [Proposal] Global temporary tables