Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

From: Ivan Voras <ivoras(at)freebsd(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
Date: 2010-10-07 12:47:06
Message-ID: i8kfft$e5j$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/07/10 02:39, Robert Haas wrote:
> On Wed, Oct 6, 2010 at 6:31 PM, Ivan Voras<ivoras(at)freebsd(dot)org> wrote:
>> On 10/04/10 20:49, Josh Berkus wrote:
>>
>>>> The other major bottleneck they ran into was a kernel one: reading from
>>>> the heap file requires a couple lseek operations, and Linux acquires a
>>>> mutex on the inode to do that. The proper place to fix this is
>>>> certainly in the kernel but it may be possible to work around in
>>>> Postgres.
>>>
>>> Or we could complain to Kernel.org. They've been fairly responsive in
>>> the past. Too bad this didn't get posted earlier; I just got back from
>>> LinuxCon.
>>>
>>> So you know someone who can speak technically to this issue? I can put
>>> them in touch with the Linux geeks in charge of that part of the kernel
>>> code.
>>
>> Hmmm... lseek? As in "lseek() then read() or write()" idiom? It AFAIK
>> cannot be fixed since you're modifying the global "strean position"
>> variable and something has got to lock that.
>
> Well, there are lock free algorithms using CAS, no?

Nothing is really "lock free" - in this case the algorithms simply push
the locking down to atomic operations on the CPU (and the memory bus).
Semantically, *something* has to lock the memory region for however
brief period of time and then propagate that update to other CPUs'
caches (i.e. invalidate them).

>> OTOH, pread() / pwrite() don't have to do that.
>
> Hey, I didn't know about those. That sounds like it might be worth
> investigating, though I confess I lack a 48-core machine on which to
> measure the alleged benefit.

As Jon said, it will in any case reduce the number of these syscalls by
half, and they can be wrapped by a C macro for the platforms which don't
implement them.

http://man.freebsd.org/pread

(and just in case it's needed: pread() is a special case of preadv()).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-10-07 12:54:58 Re: Issues with Quorum Commit
Previous Message Robert Haas 2010-10-07 12:33:07 Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-10-07 13:30:56 Re: On Scalability
Previous Message Robert Haas 2010-10-07 12:33:07 Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance