Re: General performance/load issue

From: Gaëtan Allart <gaetan(at)nexylan(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: General performance/load issue
Date: 2011-11-26 09:45:38
Message-ID: CAF671D6.DD43%gaetan@nexylan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A better view of iotop :

TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
16974 be/4 postgres 46.33 K/s 0.00 B/s 0.00 % 7.21 % postgres:
database database 46.105.111.92(54930) idle
383 be/4 postgres 7.72 K/s 0.00 B/s 0.00 % 1.56 % postgres:
database database 46.105.104.205(51706) idle in transaction
15934 be/4 postgres 0.00 B/s 38.61 K/s 0.00 % 0.04 % postgres: wal
writer process
31487 be/4 postgres 0.00 B/s 11.32 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(48616) SELECT
29541 be/4 postgres 0.00 B/s 11.52 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(40675) SELECT
31419 be/4 postgres 0.00 B/s 5.99 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(48476) SELECT
31875 be/4 postgres 0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(50228) SELECT
30985 be/4 postgres 0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(47672) SELECT
30902 be/4 postgres 0.00 B/s 6.04 M/s 0.00 % 0.00 % postgres:
database database 176.31.228.6(43295) SELECT
30903 be/4 postgres 0.00 B/s 4.79 M/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(47565) SELECT
15933 be/4 postgres 0.00 B/s 77.22 K/s 0.00 % 0.00 % postgres:
writer process

As you can see, SELECTS are writing very much on the disk.

At this moment, pg_stat_activity show many many queries running at the
same time.
It looks like that all queries are suddenly writing on disk and not a
particular one, making me think of a buffer issue or something.

This is a example of 10 MB/s writing query :

SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
"table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
"table"."url_article" AS "url_article", "table"."original_url" AS
"original_url", "table"."name" AS "name", "table"."description" AS
"description", "table"."content" AS "content", "table"."permis" AS
"permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
"author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
"post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
"add_date", "table"."comments_open" AS "comments_open", "table"."site_id"
AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
"status", "table"."visits" AS "visits", "table"."votes" AS "votes",
"table"."score" AS "score", "arti

Checkpoints logs still show very long write times :

LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
total=63.625 s

Couldn't this be a hardware issue ?

Best regards,

Gaëtan

Le 26/11/11 04:42, « Tomas Vondra » <tv(at)fuzzy(dot)cz> a écrit :

>Dne 26.11.2011 00:17, Cédric Villemain napsal(a):
>> Le 25 novembre 2011 23:47, Gaëtan Allart <gaetan(at)nexylan(dot)com> a écrit :
>>> Hello Tomas and Cédric,
>>>
>>> Right now, the server is not all right. Load is above 30 and queries
>>>are
>>> slow like hell.
>>>
>>>
>>> Here's the complete iotop. Note the 71 MB/s writes (apparently on
>>>SELECT
>>> queries).
>>>
>>> Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
>>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
>>>
>>>
>>> 27352 be/4 postgres 15.64 K/s 86.01 K/s 0.00 % 99.99 % postgres:
>>> database database 176.31.228.6(38816) SELECT
>>> 20226 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 99.99 % postgres:
>>> database database 176.31.228.6(34166) SELECT
>>> 26950 be/4 postgres 23.46 K/s 0.00 B/s 0.00 % 82.14 % postgres:
>>> database database 46.105.104.205(40820) SELECT
>>> 23160 be/4 postgres 3.91 K/s 0.00 B/s 0.00 % 81.14 % postgres:
>>> database database 46.105.104.205(58091) SELECT
>>> 29184 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 79.17 % postgres:
>>> database database 46.105.104.205(51047) SELECT
>
>Gaetan, you keep deleting the interesting rows for some reason. None of
>the rows you've posted writes more than a few KB/s - what about the rows
>that actually write those 71MB/s?
>
>>> Here is dirty_expire_centisecs :
>>>
>>> cat /proc/sys/vm/dirty_expire_centisecs
>>> 3000
>
>OK. That's a default value and it's usually too high (just like the
>ratios), but don't change it until we find out what's wrong.
>
>>> Bgwriter configuration is default :
>>>
>>> #bgwriter_delay = 200ms # 10-10000ms between rounds
>>> #bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
>>> #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers
>>>scanned/round
>
>OK.
>
>>> Is there anything I can provide to help you ?
>>
>> the checkpoints logs and the output of pg_stat_bgwriter (as asked by
>>Tomas).
>>
>> It is probable that shared_buffers are too small for your workload
>> (expected), do you still have issue with checkpoint sync duration ?
>
>I don't think that's the case. Too small shared buffers usually cause a
>lot of reads (especially when all the queries are SELECTs as here), but
>he has problems with writes.
>
>And according to the pg_stat_bgwriter Gaetan posted, the checkpoints
>wrote about 54MB, bgwriter about 370MB and backends about 80MB (during
>the 5 minutes between snapshots).
>
>So I'm really wondering where those 70MB/s of writes come from.
>
>Two things just crossed my mind. The first one are hint bits - this may
>cause a SELECT to write a lot of data. But I guess this is included in
>the pg_stat_bgwriter stats.
>
>The second one is on-disk sorting - this happens when a query needs to
>sort so much data it can't be done in work_mem, so the data are pushed
>to the disk, and AFAIK it's not included into the pg_stat_bgwriter.
>
>But he has work_mem set to 128MB so it's rather unlikely. Gaetan, can
>you verify that those queries that write the most data to the disk are
>not performing any huge sorts etc? (See iotop which postgres process is
>writing a lot of data and use the PID and pg_stat_activity to find out
>which query it's executing.)
>
>Tomas
>
>--
>Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tomas Vondra 2011-11-26 14:12:05 Re: General performance/load issue
Previous Message Condor 2011-11-26 07:28:06 Re: How to add conversion between LATIN1 and WIN1251 ?