From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | digoal(at)126(dot)com |
Cc: | PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14159: PostgreSQL 9.6 parallel scan consume very high mutex lock |
Date: | 2016-05-25 13:51:09 |
Message-ID: | CAA4eK1KbQsym9GsQedsZ8cqD7h=-rU7y=QQ0S8GTrw43HZgQfA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, May 25, 2016 at 5:21 PM, <digoal(at)126(dot)com> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 14159
> Logged by: Zhou Digoal
> Email address: digoal(at)126(dot)com
> PostgreSQL version: 9.6beta1
> Operating system: CentOS 6.x x64
> Description:
>
> My test environment:
> CentOS 6
> 64 CORE cpu
>
> this is not use parallel
> postgres=# alter table t_bit2 set (parallel_degree=0);
> ALTER TABLE
> Time: 0.335 ms
>
> postgres=# select count(*) from t_bit2 ;
> count
> ------------
> 1600000000
> (1 row)
> Time: 141377.100 ms
>
> and the perftop is
> ```
> PerfTop: 1075 irqs/sec kernel:21.5% exact: 0.0% [1000Hz cycles],
> (all, 64 CPUs)
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> samples pcnt function DSO
> _______ _____ ________________________________
>
> _________________________________________________________________________________
>
> 2493.00 15.2% heap_getnext
> /home/digoal/pgsql9.6/bin/postgres
>
>
>
> ```
> And when use 64 parallel
>
> postgres=# alter table t_bit2 set (parallel_degree=64);
> ALTER TABLE
> Time: 0.180 ms
> postgres=# explain select count(*) from t_bit2 ;
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=12108923.92..12108923.93 rows=1 width=8)
> -> Gather (cost=12108917.35..12108923.76 rows=64 width=8)
> Workers Planned: 64
> -> Partial Aggregate (cost=12107917.35..12107917.36 rows=1
> width=8)
> -> Parallel Seq Scan on t_bit2 (cost=0.00..12046131.88
> rows=24714188 width=0)
> (5 rows)
>
> Time: 0.338 ms
> postgres=# select count(*) from t_bit2 ;
> count
> ------------
> 1600000000
> (1 row)
>
> Time: 37181.297 ms
>
> the perftop is
> ```
> PerfTop: 47649 irqs/sec kernel:83.5% exact: 0.0% [1000Hz cycles],
> (all, 64 CPUs)
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> samples pcnt function DSO
> _______ _____ _______________________________
> ______________________________________________________________________
>
> 436271.00 76.5% __mutex_lock_slowpath
> [kernel.kallsyms]
>
> 13897.00 2.4% _spin_lock
> [kernel.kallsyms]
>
> 11159.00 2.0% heap_getnext
> /home/digoal/pgsql9.6/bin/postgres
>
>
> ```
>
> when i set paralle degree to 16
>
> postgres=# alter table t_bit2 set (parallel_degree=16);
> ALTER TABLE
> Time: 0.433 ms
> postgres=# select count(*) from t_bit2 ;
> count
> ------------
> 1600000000
> (1 row)
>
> Time: 9534.304 ms
>
> the perf top is
> ```
> PerfTop: 16436 irqs/sec kernel:21.9% exact: 0.0% [1000Hz cycles],
> (all, 64 CPUs)
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> samples pcnt function DSO
> _______ _____ _______________________________
> ______________________________________________________________________
>
> 10020.00 11.0% heap_getnext
> /home/dege.zzz/pgsql9.6/bin/postgres
> 9804.00 10.8% copy_user_generic_string
> [kernel.kallsyms]
>
>
> ```
>
> why?
>
>
Can we get the detailed call stack so that we can know from where the
spinlock call is initiated? One guess is that it could be due to the
contention on parallel_scan->phs_mutex spinlock which is acquired to move
to next heap page, but it is difficult to say from the information
provided. In general, I think using more than required workers can lead to
slow down in system and this seems to be one example of same. By the way,
what is the value selected by system for parallel workers if you don't set
parallel_degree parameter for a relation, you can check that by setting
default value (-1) of parallel_degree for a relation and
max_parallel_degree = 64 and max_worker_processes = 64.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-05-25 14:02:13 | Re: [BUGS] BUG #14155: bloom index error with unlogged table |
Previous Message | Jeff Janes | 2016-05-25 12:45:19 | Re: [BUGS] BUG #14155: bloom index error with unlogged table |