Re: General performance/load issue

From: Gaëtan Allart <gaetan(at)nexylan(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: General performance/load issue
Date: 2011-11-26 16:47:03
Message-ID: CAF6D536.DE59%gaetan@nexylan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Rahh :/

It's getting worse and worse :/ Database has to be restarted every 2 hours
causing much traffic loss :/

As far as the server is concerned, it was running great 7 days ago and had
been running like this for months. I really don't get why it suddenly went
"I/oing"Š

Here's the current postgresql.conf :

shared_buffers = 6GB # min 128kB
# (change requires restart)
temp_buffers = 40MB # min 800kB
# (change requires restart)
work_mem = 96MB # min 64k
maintenance_work_mem = 256MB # min 1MB
max_stack_depth = 2MB # min 100kB

effective_cache_size = 32GB

max_files_per_process = 8192 # min 25
# (change requires restart)

fsync = on # turns forced synchronization on or offf
synchronous_commit = off # immediate fsync at commit
# supported by the operating system:
wal_buffers = 16MB # min 32kB

checkpoint_segments = 40 # in logfile segments, min 1, 16MB
eachcheckpoint_timeout = 5min # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0

seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 2.0 # same scale as above

Isn't there anything I can do to keep my database Up&running even with bad
performance?

Filesystem is ext3. Running over a hardware RAID-1 config.

Gaëtan

Le 26/11/11 15:12, « Tomas Vondra » <tv(at)fuzzy(dot)cz> a écrit :

>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
>> A better view of iotop :
>>
>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
>> 31875 be/4 postgres 0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres:
>> database database 46.105.104.205(50228) SELECT
>> 30985 be/4 postgres 0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres:
>> database database 46.105.104.205(47672) SELECT
>>
>> As you can see, SELECTS are writing very much on the disk.
>>
>> At this moment, pg_stat_activity show many many queries running at the
>> same time.
>> It looks like that all queries are suddenly writing on disk and not a
>> particular one, making me think of a buffer issue or something.
>
>No, I don't think this is an issue with the size of shared buffers. That'd
>influence the pg_stat_bgwriter - the buffers_backend would grow much
>faster, and it's not the case.
>
>So I'm guessing it's either hint bits or sorting. If I had to, I'd
>probably guess about the hint bits - see for example this thread
>
>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php
>
>it more or less behaves like what you described so far.
>
>> This is a example of 10 MB/s writing query :
>>
>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
>> "table"."url_article" AS "url_article", "table"."original_url" AS
>> "original_url", "table"."name" AS "name", "table"."description" AS
>> "description", "table"."content" AS "content", "table"."permis" AS
>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
>> "add_date", "table"."comments_open" AS "comments_open",
>>"table"."site_id"
>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
>> "table"."score" AS "score", "arti
>
>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
>it).
>
>> Checkpoints logs still show very long write times :
>>
>> LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
>> total=63.625 s
>
>No, that's fine - that's what spread checkpoints do. Once the
>checkpoint_timeout expires, the system decides to perform a checkpoint,
>i.e. it has to write all dirty (modified) shared buffers to the disk. But
>it knows the next checkpoint will happen in checkpoint_timeout, so it has
>about 5 minutes to write all the data.
>
>So it says something like 'I need to write 540MB want to write that in 270
>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
>write phase is expected to take long. But it has to sync the data at the
>end, and that's where the problems usually happen - so the important thing
>is 'sync' and that improved significantly. 6 seconds is not great but it's
>not a big issue I guess.
>
>> Couldn't this be a hardware issue ?
>
>I don't think so. The problem is that the queries write a lot of data, and
>the SSD can't help in this case. The SSDs provide much better random
>performance (compared to spinners), but the sequential performance is not
>much better. So pushing 70MB/s to the SSD may be fully utilized.
>
>Have you tested the basic performance (like bonnie++, dd etc) before going
>into production? There are some common SSD-related mistakes (e.g. not
>aligning the partitions properly) that may easily cause 30% performance
>loss. But this won't fix the problem, although you could get a bit better
>performance.
>
>Tomas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gaëtan Allart 2011-11-26 16:50:55 Re: General performance/load issue
Previous Message Robert Treat 2011-11-26 14:25:00 Re: General performance/load issue