Strange serialization problem

From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Strange serialization problem
Date: 2005-04-13 22:16:54
Message-ID: 1113430614.425d9a560e5b5@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-performance

I have a performance problem; I'd like any suggestions on where to continue
investigation.

A set of insert-only processes seems to serialize itself. :-(

The processes appear to be blocked on disk IO, and probably the table drive,
rather than the pg_xlog drive.

Each process is inserting a block of 10K rows into a table.
I'm guessing they are "serialized" because one process by itself takes 15-20
secs; running ten processes in parallel averages 100-150 secs (each), with
elapsed (wall) time of 150-200 secs.

Polling pg_locks shows each process has (been granted) only the locks you would
expect. I RARELY see an Exclusive lock on an index, and then only on one index
at a time.

A sample from pg_locks:

TABLE/INDEX GRANTED PID MODE
m_reason t 7340 AccessShare
message t 7340 AccessShare
message t 7340 RowExclusive
pk_message t 7340 AccessShare
tmp_message t 7340 AccessShare
("m_reason" is a one-row lookup table; see INSERT cmd below).

--------------------------
The query plan is quite reasonable (see below).

On a side note, this is the first app I've had to deal with that is sweet to
pg_xlog, but hammers the drive bearing the base table (3x the traffic).

"log_executor_stats" for a sample insert look reasonable (except the "elapsed"!)

! system usage stats:
! 308.591728 elapsed 3.480000 user 1.270000 system sec
! [4.000000 user 1.390000 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 18212/15 [19002/418] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [0/0] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 9675 read, 8781 written, buffer hit rate = 97.66%
! Local blocks: 504 read, 64 written, buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written

Summarized "ps" output for the above backend process, sampled every 5 secs,
shows it is 94% in the 'D' state, 3% in the 'S' state.

================
== BACKGROUND ==
================

**SOFTWARE
- PG 7.4.6, RedHat 8.

----------------------------------
**HARDWARE
Xeon 2x2 2.4GHz 2GB RAM
4 x 73GB SCSI; pg_xlog and base on separate drives.

----------------------------------
**APPLICATION

Six machines post batches of 10K messages to the PG db server.
Machine #nn generates its ID keys as "nn00000000001"::bigint etc.

Each process runs:
- "COPY tmp_message FROM STDIN" loads its own one-use TEMP table.
- " INSERT INTO message
SELECT tmp.* FROM tmp_message AS tmp
JOIN m_reason ON m_reason.name = tmp.reason
LEFT JOIN message USING (ID) WHERE message.ID is null
(check required because crash recovery logic requires idempotent insert)
"DROP TABLE tmp_message" --- call me paranoid, this is 7.4

The COPY step time is almost constant when #processes varies from 1 to 10.

----------------------------------
**POSTGRES
pg_autovacuum is running with default parameters.

Non-default GUC values:
checkpoint_segments = 512
default_statistics_target = 200
effective_cache_size = 500000
log_min_duration_statement = 1000
max_fsm_pages = 1000000
max_fsm_relations = 1000
random_page_cost = 1
shared_buffers = 10000
sort_mem = 16384
stats_block_level = true
stats_command_string = true
stats_row_level = true
vacuum_mem = 65536
wal_buffers = 2000

Wal_buffers and checkpoint_segments look outrageous,
but were tuned for another process, that posts batches of 10000 6KB rows
in a single insert.
----------------------------------
TABLE/INDEX STATISTICS

----------------------------------
MACHINE STATISTICS

ps gives the backend process as >98% in (D) state, with <1% CPU.

A "top" snapshot:
CPU states: cpu user nice system irq softirq iowait idle
total 2.0% 0.0% 0.8% 0.0% 0.0% 96.9% 0.0%
cpu00 2.5% 0.0% 1.9% 0.0% 0.0% 95.4% 0.0%
cpu01 1.7% 0.0% 0.1% 0.0% 0.3% 97.6% 0.0%
cpu02 0.5% 0.0% 0.7% 0.0% 0.0% 98.6% 0.0%
cpu03 3.1% 0.0% 0.5% 0.0% 0.0% 96.2% 0.0%
Mem: 2061552k av, 2041752k used, 19800k free, 0k shrd, 21020k buff

iostat reports that the $PGDATA/base drive is being worked but not overworked.
The pg_xlog drive is underworked:

KBPS TPS KBPS TPS KBPS TPS KBPS TPS
12:30 1 2 763 16 31 8 3336 269
12:40 5 3 1151 22 5 5 2705 320
^pg_xlog^ ^base^

The base drive has run as much as 10MBPS, 5K TPS.
----------------------------------
EXPLAIN ANALYZE output:
The plan is eminently reasonable. But there's no visible relationship
between the top level "actual time" and the "total runtime":

Nested Loop Left Join
(cost=0.00..31109.64 rows=9980 width=351)
(actual time=0.289..2357.346 rows=9980 loops=1)
Filter: ("inner".id IS NULL)
-> Nested Loop
(cost=0.00..735.56 rows=9980 width=351)
(actual time=0.092..1917.677 rows=9980 loops=1)
Join Filter: (("outer".name)::text = ("inner".reason)::text)
-> Seq Scan on m_reason r
(cost=0.00..1.01 rows=1 width=12)
(actual time=0.008..0.050 rows=1 loops=1)
-> Seq Scan on tmp_message t
(cost=0.00..609.80 rows=9980 width=355)
(actual time=0.067..1756.617 rows=9980 loops=1)
-> Index Scan using pk_message on message
(cost=0.00..3.02 rows=1 width=8)
(actual time=0.014..0.014 rows=0 loops=9980)
Index Cond: ("outer".id = message.id)
Total runtime: 737401.687 ms

--
"Dreams come true, not free." -- S.Sondheim, ITW

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Slavisa Garic 2005-04-14 01:11:51 Re: [NOVICE] Many connections lingering
Previous Message Tim Pushor 2005-04-13 22:03:25 Re: ISAM to SQL

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2005-04-13 22:27:20 Re: Compressing WAL
Previous Message Tom Mack 2005-04-13 21:06:57 Problem with slow query (caused by improper nestloop?)