Re: Performances issues with SSD volume ?

From: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
To: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>, Thomas SIMON <tsimon(at)neteven(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Performances issues with SSD volume ?
Date: 2015-05-20 17:03:26
Message-ID: 436879888.4856190.1432141407005.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

----- Original Message -----
> From: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
> To: Thomas SIMON <tsimon(at)neteven(dot)com>
> Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
> Sent: Wednesday, 20 May 2015, 17:50
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
>
>> From: Thomas SIMON <tsimon(at)neteven(dot)com>
>> To: glynastill(at)yahoo(dot)co(dot)uk
>> Cc: "pgsql-admin(at)postgresql(dot)org"
> <pgsql-admin(at)postgresql(dot)org>
>> Sent: Wednesday, 20 May 2015, 16:41
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>> Hi Glyn,
>>
>> I'll try to answer this points.
>>
>> I've made some benchs, and indeed 3.2 not helping. not helping at all.
>> I changed to 3.14 and gap is quite big !
>> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>> conditions
>> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>> conditions too.
>
>>
>
>
> That's a start then.
>
>> It should so be better, but when server was in production, and ever with
>> bad kernel, performances was already quite good before they quickly
>> decreased.
>> So i think too I have another configuration problem.
>>
>> You say you're IO bound, so some output from sar / iostat / dstat and
>> pg_stat_activity etc before and during the issue would be of use.
>>
>> -> My server is not in production right now, so it is difficult to
>> replay production load and have some useful metrics.
>> The best way I've found is to replay trafic from logs with pgreplay.
>> I hoped that the server falls back by replaying this traffic, but it
>> never happens ... Another thing I can't understand ...
>>
>> Below is my dstat output when I replay this traffic (and so when server
>> runs normally)
>> I have unfortunately no more outputs when server's performances
> decreased.
>
>>
>
> It's a shame we can't get any insight into activity on the server during
> the issues.
>>
>>
>> Other things you asked
>>
>> System memory size : 256 Go
>> SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>> INTEL SSDSC2BB480G4
>> Raid controller : MegaRAID SAS 2208
>> Partition alignments and stripe sizes : see fdisk delow
>> Kernel options : the config file is here :
>>
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>> Filesystem used and mount options : ext4, see mtab below
>> IO Scheduler : noop [deadline] cfq for my ssd raid volume
>> Postgresql version and configuration : 9.3.5
>>
>> max_connections=1800
>> shared_buffers=8GB
>> temp_buffers=32MB
>> work_mem=100MB
>> maintenance_work_mem=12GB
>> bgwriter_lru_maxpages=200
>> effective_io_concurrency=4
>> wal_level=hot_standby
>> wal_sync_method=fdatasync
>> wal_writer_delay=2000ms
>> commit_delay=1000
>> checkpoint_segments=80
>> checkpoint_timeout=15min
>> checkpoint_completion_target=0.7
>> archive_command='rsync ....'
>> max_wal_senders=10
>> wal_keep_segments=38600
>> vacuum_defer_cleanup_age=100
>> hot_standby = on
>> max_standby_archive_delay = 5min
>> max_standby_streaming_delay = 5min
>> hot_standby_feedback = on
>> random_page_cost = 1.0
>> effective_cache_size = 240GB
>> log_min_error_statement = warning
>> log_min_duration_statement = 0
>> log_checkpoints = on
>> log_connections = on
>> log_disconnections = on
>> log_line_prefix = '%m|%u|%d|%c|'
>> log_lock_waits = on
>> log_statement = 'all'
>> log_timezone = 'localtime'
>> track_activities = on
>> track_functions = pl
>> track_activity_query_size = 8192
>> autovacuum_max_workers = 5
>> autovacuum_naptime = 30s
>> autovacuum_vacuum_threshold = 40
>> autovacuum_analyze_threshold = 20
>> autovacuum_vacuum_scale_factor = 0.10
>> autovacuum_analyze_scale_factor = 0.10
>> autovacuum_vacuum_cost_delay = 5ms
>> default_transaction_isolation = 'read committed'
>> max_locks_per_transaction = 128
>>
>>
>>
>> Connection pool sizing (pgpool2)
>> num_init_children = 1790
>> max_pool = 1
>
>>
>
>
> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
> effectively just using pgpool as a proxy (as I recall, my memory is a little
> fuzzy on pgpool now). Unless your app is stateful in some way or has unique
> users for each of those 1800 connections you should lower the quantity of active
> connections. A general starting point is usually cpu cores * 2, so you could up
> max_pool and divide num_init_children by the same amount.
>
> Hard to say what you need to do without knowing what exactly you're doing
> though. What's the nature of the app(s)?
>
>> I also add megacli parameters :
>>
>> Virtual Drive: 2 (Target Id: 2)
>> Name :datassd
>> RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
>> Size : 893.25 GB
>> Sector Size : 512
>> Is VD emulated : Yes
>> Mirror Data : 893.25 GB
>> State : Optimal
>> Strip Size : 256 KB
>> Number Of Drives per span:2
>> Span Depth : 2
>> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>> Cache if Bad BBU
>> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>> Cache if Bad BBU
>> Default Access Policy: Read/Write
>> Current Access Policy: Read/Write
>> Disk Cache Policy : Enabled
>> Encryption Type : None
>> Bad Blocks Exist: No
>> PI type: No PI
>>
>> Is VD Cached: No
>
>>
>
>
> Not using your raid controllers write cache then? Not sure just how important
> that is with SSDs these days, but if you've got a BBU set it to
> "WriteBack". Also change "Cache if Bad BBU" to "No
> Write Cache if Bad BBU" if you do that.
>
>
>>
>> Other outputs :
>> fdisk -l
>>
>> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x00000000
>>
>> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x00000000
>>
>>
>> cat /etc/mtab
>> /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>> rw,relatime,discard,nobarrier,data=ordered 0 0
>> (I added nobarrier option)
>>
>>
>> cat /sys/block/sdc/queue/scheduler
>> noop [deadline] cfq
>>
>
>>
>
>
> You could swap relatime for noatime,nodiratime.
>

You could also see if the noop scheduler makes any improvement.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Thom Brown 2015-05-20 17:09:05 Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0
Previous Message Andres Freund 2015-05-20 16:54:06 Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0