On Mon, Jan 30, 2012 at 12:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 27, 2012 at 8:21 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>> Also, I think the general approach is wrong. The only reason to have
>>>> these pages in shared memory is that we can control access to them to
>>>> prevent write/write and read/write corruption. Since these pages are
>>>> never written, they don't need to be in shared memory. Just read
>>>> each page into backend-local memory as it is needed, either
>>>> palloc/pfree each time or using a single reserved block for the
>>>> lifetime of the session. Let the kernel worry about caching them so
>>>> that the above mentioned reads are cheap.
>>> right -- exactly. but why stop at one page?
>> If you have more than one, you need code to decide which one to evict
>> (just free) every time you need a new one. And every process needs to
>> be running this code, while the kernel is still going to need make its
>> own decisions for the entire system. It seems simpler to just let the
>> kernel do the job for everyone. Are you worried that a read syscall
>> is going to be slow even when the data is presumably cached in the OS?
> I think that would be a very legitimate worry. You're talking about
> copying 8kB of data because you need two bits. Even if the
> user/kernel mode context switch is lightning-fast, that's a lot of
> extra data copying.
I guess the most radical step in the direction I am advocating would
be to simply read the one single byte with the data you want. Very
little copying, but then the odds of the next thing you want being on
the one <chunk of data> you already had in memory is much smaller.
> In a previous commit, 33aaa139e6302e81b4fbf2570be20188bb974c4f, we
> increased the number of CLOG buffers from 8 to 32 (except in very
> low-memory configurations). The main reason that shows a win on Nate
> Boley's 32-core test machine appears to be because it avoids the
> scenario where there are, say, 12 people simultaneously wanting to
> read 12 different CLOG buffers, and so 4 of them have to wait for a
> buffer to become available before they can even think about starting a
> read. The really bad latency spikes were happening not because the
> I/O took a long time, but because it can't be started immediately.
Ah, I hadn't followed that closely. I had thought the main problem
solved by that patch was that sometimes all of the CLOG buffers would
be dirty, and so no one could read anything in until something else
was written out, which could involve either blocking writes on a
system with checkpoint-sync related constipation, or (if
synchronous_commit=off) fsyncs. By reading the old-enough ones into
local memory, you avoid both any locking and any writes. Simon's
patch solves the writes, but there is still locking.
I don't have enough hardware to test any of these theories, so all I
can do is wave hands around. Maybe if I drop the number of buffers
from 32 back to 8 or even 4, that would create a model system that
could usefully test out the theories on hardware I have, but I'd doubt
how transferable the results would be. With Simon's patch if I drop
it to 8, it would really be 16 as there are now 2 sets of them, so I
suppose it should be compared to head with 16 buffers to put them on
an equal footing.
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-02-01 02:56:12|
|Subject: Re: Should I implement DROP INDEX CONCURRENTLY?|
|Previous:||From: Alvaro Herrera||Date: 2012-01-31 23:55:19|
|Subject: Re: foreign key locks, 2nd attempt|