From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Gather performance analysis |
Date: | 2021-09-08 06:15:16 |
Message-ID: | CAFiTN-t8NMa-UVVTbm57jyZRfGjyWumDSDtXxuGfUKP2yuKcpQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 8, 2021 at 3:08 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Looking at this profile made me wonder if this was a build without
> optimizations. The pg_atomic_read_u64()/pg_atomic_read_u64_impl() calls
> should
> be inlined. And while perf can reconstruct inlined functions when using
> --call-graph=dwarf, they show up like "pg_atomic_read_u64 (inlined)" for
> me.
>
Yeah, for profiling generally I build without optimizations so that I can
see all the functions in the stack, so yeah profile results are without
optimizations build but the performance results are with optimizations
build.
>
> FWIW, I see times like this
>
> postgres[4144648][1]=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t;
>
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
>
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Gather (cost=1000.00..6716686.33 rows=200000000 width=208) (actual
> rows=200000000 loops=1) │
> │ Workers Planned: 2
> │
> │ Workers Launched: 2
> │
> │ -> Parallel Seq Scan on t (cost=0.00..6715686.33 rows=83333333
> width=208) (actual rows=66666667 loops=3) │
> │ Planning Time: 0.043 ms
> │
> │ Execution Time: 24954.012 ms
> │
>
> └──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (6 rows)
>
>
Is this with or without patch, I mean can we see a comparison that patch
improved anything in your environment?
Looking at a profile I see the biggest bottleneck in the leader (which is
> the
> bottleneck as soon as the worker count is increased) to be reading the
> length
> word of the message. I do see shm_mq_receive_bytes() in the profile, but
> the
> costly part there is the "read % (uint64) ringsize" - divisions are slow.
> We
> could just compute a mask instead of the size.
>
Yeah that could be done, I can test with this change as well that how much
we gain with this.
>
> We also should probably split the read-mostly data in shm_mq (ring_size,
> detached, ring_offset, receiver, sender) into a separate cacheline from the
> read/write data. Or perhaps copy more info into the handle, particularly
> the
> ringsize (or mask).
>
Good suggestion, I will do some experiments around this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2021-09-08 06:16:32 | Re: .ready and .done files considered harmful |
Previous Message | Michael Paquier | 2021-09-08 06:08:28 | Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)? |