Re: Reduce useless changes before reassembly during logical replication

From: li jie <ggysxcq(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Reduce useless changes before reassembly during logical replication
Date: 2024-03-06 10:00:59
Message-ID: CAGfChW7XpZGkxOpHZnv69+H_AyKzbffcrumyNu4Fz0u+1ADPxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry I replied too late.

> This is all true but note that in successful cases (where the table is
> published) all the work done by FilterByTable(accessing caches,
> transaction-related stuff) can add noticeable overhead as anyway we do
> that later in pgoutput_change().

You are correct. Frequent opening of transactions and access to cache will
cause a lot of overhead, which Andy has tested and proved.

The root cause is because every dml wal record needs to do this, which is really
wasteful. I use a hash table LocatorFilterCache to solve this problem.
After getting
a RelFileLocator, I go to the hash table to check its
PublicationActions and filter it
based on the PublicationActions to determine whether it has been published.

The effect of my test is very obvious: (perf record)
v1:
Children Self Command Shared O Symbol
+ 22.04% 1.53% postgres postgres [.] FilterByTable

v2:
Children Self Command Shared O Symbol
+ 0.58% 0.00% postgres postgres [.] ReorderBufferFilterByLocator

v1 patch introduces 20% overhead, while v2 only has 0.58%.

> Note, users can
>configure to stream_in_progress transactions in which case they
> shouldn't see such a big problem.

Yes, stream mode can prevent these irrelevant changes from being written to
disk or sent to downstream.
However, CPU and memory consumption will also be incurred when processing
these useless changes. Here is my simple test[1]:

base on master :

CPU stat: perf stat -p pid -e cycles -I 1000
# time counts unit events
76.007070936 9,691,035 cycles
77.007163484 5,977,694 cycles
78.007252533 5,924,703 cycles
79.007346862 5,861,934 cycles
80.007438070 5,858,264 cycles
81.007527122 6,408,759 cycles
82.007615711 6,397,988 cycles
83.007705685 5,520,407 cycles
84.007794387 5,359,162 cycles
85.007884879 5,194,079 cycles
86.007979797 5,391,270 cycles
87.008069606 5,474,536 cycles
88.008162827 5,594,190 cycles
89.008256327 5,610,023 cycles
90.008349583 5,627,350 cycles
91.008437785 6,273,510 cycles
92.008527938 580,934,205 cycles
93.008620136 4,404,672 cycles
94.008711818 4,599,074 cycles
95.008805591 4,374,958 cycles
96.008894543 4,300,180 cycles
97.008987582 4,157,892 cycles
98.009077445 4,072,178 cycles
99.009163475 4,043,875 cycles
100.009254888 5,382,667 cycles

memory stat: pistat -p pid -r 1 10
07:57:18 AM UID PID minflt/s majflt/s VSZ RSS %MEM Command
07:57:19 AM 1000 11848 233.00 0.00 386872 81276 0.01 postgres
07:57:20 AM 1000 11848 235.00 0.00 387008 82068 0.01 postgres
07:57:21 AM 1000 11848 236.00 0.00 387144 83124 0.01 postgres
07:57:22 AM 1000 11848 236.00 0.00 387144 83916 0.01 postgres
07:57:23 AM 1000 11848 236.00 0.00 387280 84972 0.01 postgres
07:57:24 AM 1000 11848 334.00 0.00 337000 36928 0.00 postgres
07:57:25 AM 1000 11848 3.00 0.00 337000 36928 0.00 postgres
07:57:26 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
07:57:27 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
07:57:28 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
Average: 1000 11848 151.30 0.00 362045 60000 0.01 postgres

After patched:
# time counts unit events
76.007623310 4,237,505 cycles
77.007717436 3,989,618 cycles
78.007813848 3,965,857 cycles
79.007906412 3,601,715 cycles
80.007998111 3,670,835 cycles
81.008092670 3,495,844 cycles
82.008187456 3,822,695 cycles
83.008281335 5,034,146 cycles
84.008374998 3,867,683 cycles
85.008470245 3,996,927 cycles
86.008563783 3,823,893 cycles
87.008658628 3,825,472 cycles
88.008755246 3,823,079 cycles
89.008849719 3,966,083 cycles
90.008945774 4,012,704 cycles
91.009044492 4,026,860 cycles
92.009139621 3,860,912 cycles
93.009242485 3,961,533 cycles
94.009346304 3,799,897 cycles
95.009440164 3,959,602 cycles
96.009534251 3,960,405 cycles
97.009625904 3,762,581 cycles
98.009716518 4,859,490 cycles
99.009807720 3,940,845 cycles
100.009901399 3,888,095 cycles

08:01:47 AM UID PID minflt/s majflt/s VSZ RSS %MEM Command
08:01:48 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:49 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:50 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:51 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:52 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:53 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:54 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:55 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:56 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:57 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
Average: 1000 19466 0.00 0.00 324424 15140 0.00 postgres

Through comparison, it is found that patch is also profitable for stream mode.
Of course, LocatorFilterCache also need to deal with invalidation, such as the
corresponding relation invalidate, or pg_publication changes, just like
RelationSyncCache and RelfilenumberMapHash.
But ddl is a small amount after all, which is insignificant compared to a
large amount of dml.

Another problem is that the LocatorFilterCache looks redundant compared
to RelationSyncCache and RelfilenumberMapHash. like this:
1. RelfilenumberMapHash: relfilenode -> relation oid
2. RelationSyncCache: relation oid-> PublicationActions
3. LocatorFilterCache: RelFileLocator-> PublicationActions

The reason is that you cannot simply access two caches from the
relfilenode --> PublicationActions, and you must use historical
snapshots to access
transactions and relcache in the middle, so there is no good solution
for this for the
time being, ugly but effective.

>Therefore, IMO, the concrete way of testing this feature is by looking
>at the server logs for the following message using
>PostgreSQL::Test::Cluster log_contains().
thinks, done.

>Instead of removing is_publishable_relation from pgoutput_change, I
>think it can just be turned into an assertion
>Assert(is_publishable_relation(relation));, no?
yes, done.

>Perhaps, it must use /* FALLTHROUGH */ just like elsewhere in the
>code, otherwise a warning is thrown.
/* intentionally fall through */ can also avoid warnings.

>Can't just the LogicalDecodingContext and
>relation name, the change action be enough to decide if the table is
>publishable or not? If done this way, it can avoid some more
>processing, no?
yes, RelFileLocator filtering is used directly in v2, and change is
no longer required.

>Please run pgindent and pgperltidy on the new source code and new
>TAP test file respectively.
ok.

[1]: https://www.postgresql.org/message-id/CAGfChW62f5NTNbLsqO-6_CrmKPqBEQtWPcPDafu8pCwZznk%3Dxw%40mail.gmail.com

Attachment Content-Type Size
v2-Reduce-useless-changes-before-reassembly-during-logi.patch application/octet-stream 28.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-03-06 10:02:06 RE: speed up a logical replica setup
Previous Message Peter Eisentraut 2024-03-06 09:57:15 Re: Adding deprecation notices to pgcrypto documentation