Re: New FSM patch

From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New FSM patch
Date: 2008-09-17 12:18:21
Message-ID: 48D0F58D.8050605@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Heikki Linnakangas wrote:

<snip>

>
> Let me describe this test case first:
> - The test program calls RecordAndGetPageWithFreeSpace in a tight loop,
> with random values. There's no activity to the heap. In normal usage,
> the time spent in RecordAndGetWithFreeSpace is minuscule compared to the
> heap and index updates that cause RecordAndGetWithFreeSpace to be called.
> - WAL was placed on a RAM drive. This is of course not how people set up
> their database servers, but the point of this test was to measure CPU
> speed and scalability. The impact of writing extra WAL is significant
> and needs to be taken into account, but that's a separate test and
> discussion, and needs to be considered in comparison to the WAL written
> by heap and index updates.
>

<snip>

>
> Another surprise was how badly both implementations scale. On CVS HEAD,
> I expected the performance to be roughly the same with 1 and 2 clients,
> because all access to the FSM is serialized on the FreeSpaceLock. But
> adding the 2nd client not only didn't help, but it actually made the
> performance much worse than with a single client. Context switching or
> cache line contention, perhaps? The new FSM implementation shows the
> same effect, which was an even bigger surprise. At table sizes > 32 MB,
> the FSM no longer fits on a single FSM page, so I expected almost a
> linear speed up with bigger table sizes from using multiple clients.
> That's not happening, and I don't know why. Although, going from 33MB to
> 333 MB, the performance with 2 clients almost doubles, but it still
> doesn't exceed that with 1 client.

I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got
similar result as you. Main problem in your new implementation is
locking. On small tables where FSM fits on one page clients spend about
3/4 time to waiting on page lock. On medium tables (2level FSM) then
InsertWal lock become significant - it takes 1/4 of waiting time. Page
waiting takes "only" 1/3.

I think the main reason of scalability problem is that locking invokes
serialization.

Suggestions:

1) remove WAL logging. I think that FSM record should be recovered
during processing of others WAL records (like insert, update). Probably
only we need full page write on first modification after checkpoint.

2) break lock - use only share lock for page locking and divide page for
smaller part for exclusive locking (at least for root page)

However, your test case is too artificial. I'm going to run OLTP
workload and test it with "real" workload.

Zdenek

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-17 12:25:44 Re: Common Table Expressions (WITH RECURSIVE) patch
Previous Message Tom Lane 2008-09-17 12:16:52 Re: [PATCHES] libpq events patch (with sgml docs)