Re: Sudden connection and load average spikes with postgresql 9.3

From: eudald_v <reaven(dot)galaeindael(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sudden connection and load average spikes with postgresql 9.3
Date: 2015-07-02 15:41:20
Message-ID: 1435851680008-5856298.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear Josh,
I'm sorry I didn't write before, but we have been very busy with this issue
and, you know, when something goes wrong, the apocalypse comes with it.

I've been working on everything you suggested.

I used your tables and script and I can give you a sample of it on
locked_query_start
2015-07-02 14:49:45.972129+02 | 15314 | | 4001 |
| "TABLE_Z" | tuple | ExclusiveLock | 24018:24 | relation
| ShareUpdateExclusiveLock | | | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635601+02 | INSERT INTO "TABLE_X" ("field1", "field2", "field3",
"field4", "field5", "field6", "field7") VALUES (22359509, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10) | | | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active |
2015-07-02 14:11:45.36871+02 | autovacuum: VACUUM ANALYZE public.TABLE_Z

2015-07-02 14:49:45.972129+02 | 15857 | | 4001 |
| "TABLE_Z" | tuple | ExclusiveLock | 24018:24 | relation
| ShareUpdateExclusiveLock | | | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:22.79166+02 | 2015-07-02 14:49:22.79166+02 | 2015-07-02
14:49:22.791665+02 | INSERT INTO "TABLE_X" ("field1", "field2", "field3",
"field4", "field5", "field6", "field7") VALUES (14515978, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10) | | | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active |
2015-07-02 14:11:45.36871+02 | autovacuum: VACUUM ANALYZE public.TABLE_Z

2015-07-02 14:49:45.972129+02 | 15314 | | 14712 |
| "TABLE_Z" | tuple | ExclusiveLock | 24018:24 | relation
| AccessShareLock | | |
1YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635599+02 | 2015-07-02 14:49:26.635601+02 | INSERT INTO "TABLE_X"
("field1", "field2", "field3", "field4", "field5", "field6", "field") VALUES
(22359509, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10) | |
185.10.253.72/32 | 2015-07-02 14:48:48.841375+02 | 2015-07-02
14:48:48.841375+02 | active | 2015-07-02 14:48:48.841384+02 | INSERT
INTO "TABLE_Y" ("email_id", "sendout_id", "feed_id", "isp_id") VALUES
(46015879, 75471, 419, 0)

All that was recorded during a spike. From this log I have to point
something:
Tables TABLE_X and TABLE_Y have both a TRIGGER that does an INSERT to
TABLE_Z
As you can see, TABLE_Z was being VACUUM ANALYZED. I wonder if TRIGGERS and
VACUUM work well together, just to check another perspective.

We also have carefully looked at our scripts and we have performed some code
optimitzations (like close db connections earlier), but the spikes continue
to happen.

FS is ext4 and I don't know how can I check the transaction log
configuration

This is how IO lookslike before and after any problem happens:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 357,00 7468,00 8840,00 7468 8840
avg-cpu: %user %nice %system %iowait %steal %idle
5,02 0,00 2,44 0,06 0,00 92,47
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 471,00 7032,00 13760,00 7032 13760
avg-cpu: %user %nice %system %iowait %steal %idle
5,14 0,00 2,92 0,03 0,00 91,92
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 376,00 7192,00 8048,00 7192 8048
avg-cpu: %user %nice %system %iowait %steal %idle
4,77 0,00 2,57 0,03 0,00 92,63
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 304,00 7280,00 8252,00 7280 8252

And this is how it looks like when the spike happens:
http://pastebin.com/2hAYuDZ5

Hope it can help into determining what's happening.

Thanks for all your efforts and collaboration!
Eudald

--
View this message in context: http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856298.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Graeme B. Bell 2015-07-02 16:15:52 Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Previous Message Merlin Moncure 2015-07-02 14:25:12 Re: New server: SSD/RAID recommendations?