B-Tree contention?

From: Alan Li <alanwli(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: B-Tree contention?
Date: 2009-06-08 01:30:41
Message-ID: bc34668d0906071830s2961662am68d59cb521fec209@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I have multiple files with that have very similar distributions and I'm
seeing contention when concurrent COPY's are happening against a table with
a b-tree index on the timestamp column. Each file look something like the
following:

~4M rows with timestamp1
~4M rows with timestamp2
...

To simulate the problem, I have a single file that has:

a int, t timestamp
[1-4000000],2007-01-01 00:00:01
[1-4000000], 2007-01-01 00:00:02

The machine is an eight-core system w/ 16gb of memory. postgresql.conf is
configured with:

shared_buffers = 256MB #256MB w/ one COPY, 512MB w/ two COPY's
wal_buffers = 16MB
checkpoint_segments = 100

When I COPY a single file into the table without an index on the time
column, it takes:

real 0m19.628s
user 0m0.001s
sys 0m0.002s

When I have two COPY's with the same file concurrently into the same table
without an index on the time column, it takes:

real 0m39.933s
user 0m0.001s
sys 0m0.000s

real 0m41.294s
user 0m0.000s
sys 0m0.003s

So it takes roughly twice the time and we're not getting any increase in
throughput for the parallel COPY's. The top LWLock contentions shows that
it's the WALInsertLock.

postgres=# select * from lwlocks order by blk desc limit 20;
pid | lockid | share | write | blk
-------+--------+-------+---------+--------
14065 | 7 | 0 | 8000001 | 311884
14062 | 7 | 0 | 8000001 | 289561
14062 | 45 | 0 | 89783 | 1287
14065 | 45 | 0 | 90104 | 1207
14062 | 8 | 0 | 652 | 10
14062 | 17 | 0 | 86469 | 7
14065 | 370 | 0 | 7 | 6
14062 | 370 | 0 | 251466 | 6
14065 | 8 | 0 | 660 | 4
14065 | 29 | 46010 | 5337 | 3
14065 | 17 | 0 | 86467 | 3
14062 | 35 | 8830 | 5279 | 1
14062 | 40 | 6735 | 5139 | 1
14062 | 37 | 12860 | 5304 | 1
14062 | 29 | 46037 | 5387 | 1
14065 | 521 | 0 | 1 | 1
14065 | 28 | 6775 | 5236 | 1
14065 | 37 | 12831 | 5057 | 1
14062 | 506 | 0 | 2043 | 1
14065 | 40 | 6763 | 5251 | 1
(20 rows)

When I COPY a single file into the table with an index on the time column,
it takes:

real 0m51.486s
user 0m0.000s
sys 0m0.002s

When I have two COPY's with the same file concurrently into the same table
with an index on the time column, it takes:

real 2m4.414s
user 0m0.001s
sys 0m0.001s

real 2m4.428s
user 0m0.000s
sys 0m0.002s

So it's taking more than twice the time when the timestamp index was added.
The top LWLock contentions shows that there's a bit more contention on the
WALInsertLock, but there's additional contention on two buffer pages.

postgres=# select * from lwlocks order by blk desc limit 20;
pid | lockid | share | write | blk
-------+--------+----------+----------+--------
13888 | 7 | 0 | 16027716 | 342522
13889 | 7 | 0 | 16027881 | 339595
13889 | 400 | 4000000 | 4000002 | 72875
13889 | 388 | 4000001 | 4000000 | 72545
13888 | 388 | 4000001 | 4000001 | 70554
13888 | 400 | 4000000 | 4000002 | 68748
13889 | 45 | 0 | 89402 | 1340
13888 | 45 | 0 | 89259 | 1290
13889 | 33 | 12117131 | 6772 | 1102
13888 | 33 | 12118267 | 7178 | 1054
13888 | 29 | 4232749 | 7159 | 415
13889 | 29 | 4233831 | 7057 | 353
13888 | 34 | 4190329 | 6907 | 318
13889 | 34 | 4191618 | 6870 | 292
13888 | 38 | 4185303 | 7004 | 283
13889 | 38 | 4186310 | 7084 | 259
13889 | 922 | 0 | 278 | 86
13889 | 940 | 0 | 207 | 80
13889 | 102316 | 0 | 346 | 72
13889 | 854 | 0 | 194 | 70
(20 rows)

The two hotspots seem to represent the first BTree pages that contain the
'2007-01-01 00:00:01' and '2007-01-01 00:00:02' keys. I suspect it's due to
concurrent _bt_doinsert(), which tries to acquire both a read lock
(_bt_search) and a write lock (_bt_findinsertloc) on those first BTree
pages. Does this sound like a reasonable explanation for the contention?

Btw, this is against a 8.4beta2 build as of yesterday, and I'm pretty sure
that this happens in REL8_3_STABLE as well.

Thanks, Alan

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-06-08 03:21:17 Re: pg_migrator issue with contrib
Previous Message Robert Haas 2009-06-08 00:31:54 Re: Partial vacuum versus pg_class.reltuples