Re: Spinlock performance improvement proposal

From: Neil Padgett <npadgett(at)redhat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spinlock performance improvement proposal
Date: 2001-09-27 18:42:42
Message-ID: 3BB37322.A9B97FB4@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> Neil Padgett <npadgett(at)redhat(dot)com> writes:
> > Well. Currently the runs are the typical pg_bench runs.
>
> With what parameters? If you don't initialize the pg_bench database
> with "scale" proportional to the number of clients you intend to use,
> then you'll naturally get huge lock contention. For example, if you
> use scale=1, there's only one "branch" in the database. Since every
> transaction wants to update the branch's balance, every transaction
> has to write-lock that single row, and so everybody serializes on that
> one lock. Under these conditions it's not surprising to see lots of
> lock waits and lots of useless runs of the deadlock detector ...

The results you saw with the large number of useless runs of the
deadlock detector had a scale factor of 2. With a scale factor 2, the
performance fall-off began at about 100 clients. So, I reran the 512
client profiling run with a scale factor of 12. (2:100 as 10:500 -- so
12 might be an appropriate scale factor with some cushion?) This does,
of course, reduce the contention. However, the throughput is still only
about twice as much, which sounds good, but is still a small fraction of
the throughput realized on the same machine with a small number of
clients. (This is the uniprocessor machine.)

The new profile looks like this (uniprocessor machine):
Flat profile:

Each sample counts as 1 samples.
% cumulative self self total
time samples samples calls T1/call T1/call name
9.44 10753.00 10753.00 pg_fsync (I'd
attribute this to the slow disk in the machine -- scale 12 yields a lot
of tuples.)
6.63 18303.01 7550.00 s_lock_sleep
6.56 25773.01 7470.00 s_lock
5.88 32473.01 6700.00 heapgettup
5.28 38487.02 6014.00
HeapTupleSatisfiesSnapshot
4.83 43995.02 5508.00 hash_destroy
2.77 47156.02 3161.00 load_file
1.90 49322.02 2166.00 XLogInsert
1.86 51436.02 2114.00 _bt_compare
1.82 53514.02 2078.00 AllocSetAlloc
1.72 55473.02 1959.00 LockBuffer
1.50 57180.02 1707.00 init_ps_display
1.40 58775.03 1595.00
DirectFunctionCall9
1.26 60211.03 1436.00 hash_search
1.14 61511.03 1300.00 GetSnapshotData
1.11 62780.03 1269.00 SpinAcquire
1.10 64028.03 1248.00 LockAcquire
1.04 70148.03 1190.00 heap_fetch
0.91 71182.03 1034.00 _bt_orderkeys
0.89 72201.03 1019.00 LockRelease
0.75 73058.03 857.00
InitBufferPoolAccess
.
.
.

I reran the benchmarks on the SMP machine with a scale of 12 instead of
2. The numbers still show a clear performance drop off at approximately
100 clients, albeit not as sharp. (But still quite pronounced.) In terms
of raw performance, the numbers are comparable. The scale factor
certainly helped -- but it still seems that we might have a problem
here.

Thoughts?

Neil

--
Neil Padgett
Red Hat Canada Ltd. E-Mail: npadgett(at)redhat(dot)com
2323 Yonge Street, Suite #300,
Toronto, ON M4P 2C9

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-09-27 18:47:06 Re: Moving CVS files around?
Previous Message Tom Lane 2001-09-27 17:37:58 Re: Fragmenting tables in postgres