Re: CentOS upgrade to 6.8 resulting in high load for PostgreSQL 9.4.6 server

From: "Hazuka (CTR), Brad" <brad(dot)hazuka(dot)ctr(at)usgs(dot)gov>
To: Scott Mead <scottm(at)openscg(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: CentOS upgrade to 6.8 resulting in high load for PostgreSQL 9.4.6 server
Date: 2016-06-09 20:04:47
Message-ID: CA+eUnhVodaRgA36O+e9A1OzPAS+LSpHXJVOxGANaqj_QKE+Xng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I verified the HugePages and IOScheduler with the SA settings and they are
the same.

Bradley J. Hazuka
Database Administrator
Innovate!, Inc.
Contractor to the U.S. Geological Survey (USGS)
Earth Resources Observation and Science (EROS) Center
47914 252nd Street
Sioux Falls, SD 57198
Tel: (605) 594-2667
Fax: (605) 594-2828
E-mail: brad(dot)hazuka(dot)ctr(at)usgs(dot)gov *<<<NEW E-MAIL ADDRESS*

NOTICE: This email may contain confidential, proprietary, or competition
sensitive bid or proposal procurement information. Unauthorized disclosure
of this information may carry criminal penalties as set forth in the
Procurement Integrity Act, 41 U.S.C. 423, as amended. Further, the
unauthorized disclosure of certain commercial information by civil servants
may result in fines or imprisonment under the Trade Secrets Act (18 U.S.C.
1905). If you have received this information in error, please delete it,
including all copies, and notify the sender of the error immediately.

On Thu, Jun 9, 2016 at 12:38 PM, Scott Mead <scottm(at)openscg(dot)com> wrote:

>
>
> On Thu, Jun 9, 2016 at 8:27 AM, Hazuka (CTR), Brad <
> brad(dot)hazuka(dot)ctr(at)usgs(dot)gov> wrote:
>
>> I am hoping someone can help me out. I have a production server running
>> CentOS 6.8 and PostgreSQL community edition 9.4.6. The application hitting
>> the database is highly concurrent. When the database was running on the
>> latest 6.7 CentOS OS I had no problems, we were seeing great throughput and
>> my bottleneck was IO as I expected it to be. From the sar data my CPU
>> overhead was ~10% and my disk usage was 99%. Memory was not using swap.
>> After the SA upgraded the machine to 6.8 the profile changed. Now I am
>> seeing a high CPU bound load on the system caused by postgres. disk usage
>> is at ~50% and our application throughput has dropped dynamically. I was
>> able to pull some vmstat and perf metrics which can be seen below. Any
>> insight someone can offer would be great to return the system performance.
>> Unfortunately rolling back to 6.7 is not an option because of security
>> policy.
>>
>
> Any chance that you had tuned the IOScheduler or HugePages and now you've
> lost those settings with an OS upgrade?
>
> --Scott
>
>>
>> OS Info:
>>
>> CentOS 6.8 Linux 2.6.32-642.1.1.el6.x86_64 #1 SMP Tue May 31 21:57:07 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> DB Info:
>>
>> PostgreSQL 9.4.6
>>
>> HW Info:
>>
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Byte Order: Little Endian
>> CPU(s): 16
>> On-line CPU(s) list: 0-15
>> Thread(s) per core: 2
>> Core(s) per socket: 4
>> Socket(s): 2
>> NUMA node(s): 2
>> Vendor ID: GenuineIntel
>> CPU family: 6
>> Model: 44
>> Model name: Intel(R) Xeon(R) CPU X5687 @ 3.60GHz
>> Stepping: 2
>> CPU MHz: 3600.000
>> BogoMIPS: 7199.83
>> Virtualization: VT-x
>> L1d cache: 32K
>> L1i cache: 32K
>> L2 cache: 256K
>> L3 cache: 12288K
>> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
>> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
>>
>> total used free shared buffers cached
>> Mem: 99052244 98472232 580012 8620496 47452 95264812
>> -/+ buffers/cache: 3159968 95892276
>> Swap: 4194300 11060 4183240
>>
>> Top information
>> =================
>> top - 10:22:07 up 1 day, 1:13, 3 users, load average: 15.64, 17.16,
>> 17.39
>> Tasks: 546 total, 18 running, 528 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 91.7%us, 3.8%sy, 0.0%ni, 3.8%id, 0.6%wa, 0.0%hi, 0.1%si,
>> 0.0%st
>> Mem: 99052244k total, 98744272k used, 307972k free, 47208k buffers
>> Swap: 4194300k total, 11064k used, 4183236k free, 95472720k cached
>>
>> PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND
>>
>>
>>
>> 21968 8814 995:06.03 86.6 1.1 20 0 R 8640m 272 1.1g 600 postgres:
>> SELECT
>>
>> 21972 8814 994:41.35 85.9 1.1 20 0 R 8640m 272 1.1g 600 postgres:
>> SELECT
>>
>> 20923 8814 994:53.56 85.0 1.8 20 0 R 8641m 296 1.7g 600 postgres:
>> SELECT
>>
>> 7607 8814 16:39.25 80.1 8.0 20 0 R 8631m 268 7.6g 600 postgres:
>> SELECT
>>
>> 21058 8814 956:59.78 79.1 3.1 20 0 R 8641m 296 2.9g 600 postgres:
>> SELECT
>>
>> 7667 8814 17:22.53 77.1 0.9 20 0 R 8621m 288 855m 600 postgres:
>> SELECT
>>
>> 20899 8814 995:15.39 76.5 4.4 20 0 R 8641m 292 4.2g 600 postgres:
>> SELECT
>>
>> 7381 8814 28:59.06 75.8 0.9 20 0 R 8621m 288 865m 600 postgres:
>> SELECT
>>
>> 21966 8814 995:13.87 74.5 4.4 20 0 R 8641m 272 4.2g 600 postgres:
>> SELECT
>>
>> 7848 8814 10:22.82 73.2 0.9 20 0 R 8621m 288 840m 600 postgres:
>> SELECT
>>
>> 7806 8814 13:50.43 71.9 0.9 20 0 S 8621m 288 849m 600 postgres:
>> idle
>>
>> 7645 8814 18:08.60 70.9 0.9 20 0 R 8621m 288 852m 600 postgres:
>> SELECT
>>
>> 7619 8814 19:34.71 70.3 0.9 20 0 R 8621m 288 856m 600 postgres:
>> SELECT
>>
>> 7665 8814 17:40.34 69.6 0.9 20 0 S 8621m 288 855m 600 postgres:
>> COMMIT
>>
>> 7843 8814 13:35.96 68.6 0.9 20 0 R 8621m 288 849m 600 postgres:
>> SELECT
>>
>> 7264 8814 31:09.93 67.3 1.5 20 0 R 8633m 268 1.5g 600 postgres:
>> SELECT
>>
>> 7675 8814 15:00.69 66.7 0.9 20 0 R 8621m 288 849m 600 postgres:
>> SELECT
>>
>> 7574 8814 23:30.92 64.4 0.9 20 0 D 8621m 288 857m 600 postgres:
>> SELECT
>>
>> 6977 8814 34:06.98 64.1 2.8 20 0 R 9106m 268 2.7g 600 postgres:
>> SELECT
>>
>> 21973 8814 5:39.97 1.3 1.1 20 0 S 8640m 272 1.1g 600 postgres:
>> idle
>>
>>
>> 9212 1 13:13.12 1.0 0.0 20 0 S 44120 540 1140 600
>> /opt/pgbouncer/bin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
>>
>>
>>
>>
>> VMStat Information
>> =================
>> procs -----------memory---------- ---swap-- -----io---- --system--
>> -----cpu-----
>> r b swpd free buff cache si so bi bo in cs us sy
>> id wa st
>> 20 0 11072 749604 47000 95163200 0 0 75 230 18 26 82 0
>> 15 2 0
>> 15 0 11072 723244 47016 95176920 0 0 38 887 86610 33495 84
>> 6 10 0 0
>> 21 0 11072 708256 47024 95190704 0 0 28 3856 86047 23593 85
>> 5 9 1 0
>> 20 0 11072 702536 47040 95196160 0 0 1 3890 82383 10971 89
>> 4 6 1 0
>> 19 1 11072 696464 47048 95201504 0 0 17 522 84030 11061 90
>> 4 5 0 0
>> 21 0 11072 688740 47064 95206920 0 0 31 2238 84497 27806 87
>> 5 8 0 0
>>
>> perf top info
>> =================
>> Overhead Shared Object Symbol
>> 12.83% postgres [.] heap_hot_search_buffer
>> 6.82% postgres [.] ExecScanHashBucket
>> 6.67% postgres [.] slot_deform_tuple
>> 4.74% postgres [.] LWLockAcquire
>> 3.81% postgres [.] pg_next_dst_boundary
>> 3.43% postgres [.] texteq
>> 2.97% postgres [.] _bt_checkkeys
>> 2.79% postgres [.] j2date
>> 2.76% postgres [.] LWLockRelease
>> 2.69% postgres [.] slot_getattr
>> 2.66% postgres [.] ExecMakeFunctionResultNoSets
>> 2.48% postgres [.] DetermineTimeZoneOffsetInternal
>> 2.25% postgres [.] hash_search_with_hash_value
>> 2.04% postgres [.] tm2timestamp
>> 2.04% postgres [.] ExecEvalParamExtern
>> 1.99% postgres [.] dt2time
>> 1.67% postgres [.] date2j
>> 1.54% postgres [.] pgstat_end_function_usage
>> 1.54% postgres [.] FunctionCall2Coll
>> 1.52% postgres [.] ExecQual
>> 1.48% postgres [.] timestamp2tm
>> 1.43% postgres [.] index_fetch_heap
>> 1.33% postgres [.] index_getnext_tid
>> 1.22% postgres [.] _bt_next
>> 1.18% postgres [.] ExecEvalNullTest
>> 1.15% postgres [.] ExecScan
>> 1.14% postgres [.] btgettuple
>> 0.99% postgres [.] IndexNext
>> 0.95% postgres [.] timestamp2timestamptz
>> 0.94% postgres [.] ReleaseAndReadBuffer
>> 0.93% postgres [.] PinBuffer
>> 0.92% postgres [.] ExecEvalScalarVarFast
>> 0.87% postgres [.] ExecStoreTuple
>> 0.85% postgres [.] LockBuffer
>> 0.82% postgres [.] heap_page_prune_opt
>> 0.78% libc-2.12.so [.] __memcmp_sse4_1
>> 0.74% postgres [.] toast_raw_datum_size
>> 0.74% postgres [.] pg_detoast_datum_packed
>> 0.72% postgres [.] ExecEvalRelabelType
>> 0.68% postgres [.] CheckForSerializableConflictOut
>> 0.68% postgres [.] HeapTupleSatisfiesMVCC
>> 0.57% postgres [.] index_getnext
>> 0.53% postgres [.] ReadBuffer_common
>> 0.51% postgres [.] pgstat_init_function_usage
>>
>> IO information
>> =====================
>>
>> Tme DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz
>> await svctm %util
>> 08:00:01 dev253-5 59.13 1.89 1564.72 26.50 2.57
>> 43.43 5.90 34.90
>> 08:10:01 dev253-5 58.47 2.19 1555.84 26.65 2.20
>> 37.59 6.05 35.38
>> 08:20:01 dev253-5 60.72 36.18 1598.49 26.92 2.34
>> 38.53 5.92 35.94
>> 08:30:01 dev253-5 60.21 12.68 1562.41 26.16 2.18
>> 36.21 5.97 35.97
>> 08:40:01 dev253-5 60.77 367.49 2624.93 49.25 2.94
>> 48.33 6.09 36.98
>> 08:50:01 dev253-5 63.26 217.68 2644.81 45.25 5.41
>> 85.45 5.82 36.80
>> 09:00:01 dev253-5 60.33 1.41 1612.09 26.75 2.16
>> 35.80 5.93 35.77
>> 09:10:01 dev253-5 58.45 2.96 1586.15 27.19 3.29
>> 56.23 6.12 35.77
>> 09:20:01 dev253-5 89.26 1139.63 3323.51 50.00 3.78
>> 42.35 4.51 40.28
>> 09:30:01 dev253-5 92.84 1320.94 4093.75 58.32 4.21
>> 45.30 4.15 38.57
>> 09:40:01 dev253-5 82.81 998.29 4179.65 62.53 5.09
>> 61.35 4.69 38.85
>> 09:50:01 dev253-5 58.08 22.32 5549.46 95.93 5.22
>> 89.88 6.14 35.67
>> 10:00:01 dev253-5 60.98 50.19 10181.89 167.78 8.34
>> 136.79 6.04 36.81
>> 10:10:01 dev253-5 61.11 256.26 2992.49 53.16 3.12
>> 51.10 5.61 34.29
>> Average: dev253-5 83.02 4394.40 2225.11 79.73 3.18
>> 38.35 4.69 38.93
>>
>>
>> Bradley J. Hazuka
>> Database Administrator
>> Innovate!, Inc.
>>
>
>
>
> --
> --
> Scott Mead
> Sr. Architect
> *OpenSCG <http://openscg.com>*
> http://openscg.com
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Alex Ignatov 2016-06-10 15:03:39 Re: Server problem
Previous Message Ray Stell 2016-06-09 19:41:47 Re: Server problem