Re: Making wait events a bit more efficient

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Making wait events a bit more efficient
Date: 2021-04-02 20:06:35
Message-ID: CALNJ-vTnyVxJKXCMtFUxtYZHvm3FTmNVyX65DBsWtfmfvZF55w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

+extern PGDLLIMPORT uint32 *my_wait_event_info;

It seems volatile should be added to the above declaration. Since later:

+ *(volatile uint32 *) my_wait_event_info = wait_event_info;

Cheers

On Fri, Apr 2, 2021 at 12:45 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> This grew out of my patch to split the waits event code out of
> pgstat.[ch], which in turn grew out of the shared memory stats patch
> series.
>
>
> pgstat_report_wait_start() and pgstat_report_wait_end() currently check
> pgstat_track_activities before assigning to MyProc->wait_event_info.
> Given the small cost of the assignment, and that pgstat_track_activities
> is almost always used, I'm doubtful that that's the right tradeoff.
>
> Normally I would say that branch prediction will take care of this cost
> - but because pgstat_report_wait_{start,end} are inlined, that has to
> happen in each of the calling locations.
>
> The code works out to be something like the following (this is from
> basebackup_read_file, the simplest caller I could quickly find, I
> removed interspersed code from it):
>
> 267 if (!pgstat_track_activities || !proc)
> 0x0000000000430e4d <+13>: cmpb $0x1,0x4882e1(%rip) #
> 0x8b9135 <pgstat_track_activities>
>
> 265 volatile PGPROC *proc = MyProc;
> 0x0000000000430e54 <+20>: mov 0x48c52d(%rip),%rax #
> 0x8bd388 <MyProc>
>
> 266
> 267 if (!pgstat_track_activities || !proc)
> 0x0000000000430e5b <+27>: jne 0x430e6c <basebackup_read_file+44>
> 0x0000000000430e5d <+29>: test %rax,%rax
> 0x0000000000430e60 <+32>: je 0x430e6c <basebackup_read_file+44>
>
> 268 return;
> 269
> 270 /*
> 271 * Since this is a four-byte field which is always read
> and written as
> 272 * four-bytes, updates are atomic.
> 273 */
> 274 proc->wait_event_info = wait_event_info;
> 0x0000000000430e62 <+34>: movl $0xa000000,0x2c8(%rax)
>
> /home/andres/src/postgresql/src/backend/replication/basebackup.c:
> 2014 rc = pg_pread(fd, buf, nbytes, offset);
> 0x0000000000430e6c <+44>: call 0xc4790 <pread(at)plt>
>
> stripping the source:
> 0x0000000000430e4d <+13>: cmpb $0x1,0x4882e1(%rip) #
> 0x8b9135 <pgstat_track_activities>
> 0x0000000000430e54 <+20>: mov 0x48c52d(%rip),%rax #
> 0x8bd388 <MyProc>
> 0x0000000000430e5b <+27>: jne 0x430e6c <basebackup_read_file+44>
> 0x0000000000430e5d <+29>: test %rax,%rax
> 0x0000000000430e60 <+32>: je 0x430e6c <basebackup_read_file+44>
> 0x0000000000430e62 <+34>: movl $0xa000000,0x2c8(%rax)
> 0x0000000000430e6c <+44>: call 0xc4790 <pread(at)plt>
>
>
> just removing the pgstat_track_activities check turns that into
>
> 0x0000000000430d8d <+13>: mov 0x48c5f4(%rip),%rax #
> 0x8bd388 <MyProc>
> 0x0000000000430d94 <+20>: test %rax,%rax
> 0x0000000000430d97 <+23>: je 0x430da3 <basebackup_read_file+35>
> 0x0000000000430d99 <+25>: movl $0xa000000,0x2c8(%rax)
> 0x0000000000430da3 <+35>: call 0xc4790 <pread(at)plt>
>
> which does seem (a bit) nicer.
>
> However, we can improve this further, entirely eliminating branches, by
> introducing something like "my_wait_event_info" that initially just
> points to a local variable and is switched to shared once MyProc is
> assigned.
>
> Obviously incorrect, for comparison: Just removing the MyProc != NULL
> check yields:
> 0x0000000000430bcd <+13>: mov 0x48c7b4(%rip),%rax #
> 0x8bd388 <MyProc>
> 0x0000000000430bd4 <+20>: movl $0xa000000,0x2c8(%rax)
> 0x0000000000430bde <+30>: call 0xc47d0 <pread(at)plt>
>
> using a uint32 *my_wait_event_info yields:
> 0x0000000000430b4d <+13>: mov 0x47615c(%rip),%rax #
> 0x8a6cb0 <my_wait_event_info>
> 0x0000000000430b54 <+20>: movl $0xa000000,(%rax)
> 0x0000000000430b5a <+26>: call 0xc47d0 <pread(at)plt>
>
> Note how the lack of offset addressing in the my_wait_event_info version
> makes the instruction smaller (call is at 26 instead of 30).
>
>
> Now, perhaps all of this isn't worth optimizing, most of the things done
> within pgstat_report_wait_start()/end() are expensive-ish. And forward
> branches are statically predicted to be not taken on several
> platforms. I have seen this these instructions show up in profiles in
> workloads with contended lwlocks at least...
>
> There's also a small win in code size:
> text data bss dec hex filename
> 8932095 192160 204656 9328911 8e590f src/backend/postgres
> 8928544 192160 204656 9325360 8e4b30
> src/backend/postgres_my_wait_event_info
>
>
> If we went for the my_wait_event_info approach there is one further
> advantage, after my change to move the wait event code into a separate
> file: wait_event.h does not need to include proc.h anymore, which seems
> architecturally nice for things like fd.c.
>
>
> Attached is patch series doing this.
>
>
> I'm inclined to separately change the comment format for
> wait_event.[ch], there's no no reason to stick with the current style:
>
> /* ----------
> * pgstat_report_wait_start() -
> *
> * Called from places where server process needs to wait. This is
> called
> * to report wait event information. The wait information is stored
> * as 4-bytes where first byte represents the wait event class (type
> of
> * wait, for different types of wait, refer WaitClass) and the next
> * 3-bytes represent the actual wait event. Currently 2-bytes are
> used
> * for wait event which is sufficient for current usage, 1-byte is
> * reserved for future usage.
> *
> * NB: this *must* be able to survive being called before MyProc has been
> * initialized.
> * ----------
> */
>
> I.e. I'd like to remove the ----- framing, the repetition of the
> function name, and the varying indentation in the comment.
>
> Greetings,
>
> Andres Freund
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-02 20:10:11 Re: Making wait events a bit more efficient
Previous Message Bryn Llewellyn 2021-04-02 20:05:42 Re: Have I found an interval arithmetic bug?