Re: [HACKERS] mmap and MAP_ANON

From: dg(at)illustra(dot)com (David Gould)
To: mimo(at)interdata(dot)com(dot)pl (Michal Mosiewicz)
Cc: pgsql-hackers(at)postgresql(dot)org, maillist(at)candle(dot)pha(dot)pa(dot)us
Subject: Re: [HACKERS] mmap and MAP_ANON
Date: 1998-05-14 18:39:56
Message-ID: 9805141839.AA19284@hawk.illustra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michal Mosiewicz asks:
> Why a lot of people investigate how to replace shared memory with
> mmapping anonymously but there is no discussion on replacing
> reads/writes with memory mapping of heap files.
>
> This way we would save not only on having better system cache
> utilisation but also we would have less memory copying. For me it seems
> like a more robust solution. I suggested it few months ago.
>
> If it's a bad idea, I wonder why?

Unfortunately, it is probably a bad idea.

The postgres buffer cache is a shared pool of pages containing an assortment
of blocks from all the different tables in use by all the different backends.

That is, if backend 'a' is reading table 'ta', and backend 'b' is reading
table 'tb' then the buffer cache will have blocks from both table 'ta'
and table 'tb' in it.

The benefit occurs when backend 'x' starts reading either table 'ta' or 'tb'.
Rather than have to go to disk, it finds the pages already loaded in the
share buffer cache. Likewise, if backend 'a' should modify a page in table
'ta', the change is then visible to all the other backends (ignoring locks
for this discussion) without any explicit communication between the backends.

If we started creating a separate mmapped region for each table several
problems occur:

- each time a backend wants to use a table it will have to somehow find out
if it is already mapped, and then either map it (for the first time), or
attach to an existing mapping created by another backend. This implies
that the backends need to communicate with all the other backends to let
them know what mappings they are using.

- if two backends are using the same table, and the table is too big to
map the whole thing, then each backend needs a "window" into the table.
This becomes difficult if the two backends are using different parts of
the table (ie, the first page and the last page).

- there is a finite amount of memory available on the system for postgres
to use. This will have to be split amoung all the open tables used by
all the backends. If you have 50 backends each using 10 each with 3
indexes, you now need 2,000 mappings in the system. Assuming that there
are 2001 pages available for mapping, how do you decide with table gets
to map 2 pages? How do you get all the backends to agree about this?

Essentially, mapping tables separately creates a requirement for a huge
amount of communication and synchronization amoung the backends. And, even
if this were not prohibitive, it ends up fragmenting the available memory
for buffers so badly that the cacheing becomes ineffective.

So, unless you are going to map whole tables and those tables are needed by
_all_ the active backends the idea of mmapping separate tables is unworkable.

That said, there are tables that meet this criteria, for instance the
transaction logs and anchors. Here mmapping might indeed be useful but even
so it would take some thought and a fair amount of work to gain any benefit.

-dg

David Gould dg(at)illustra(dot)com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
and someone who knows less will correct me if I'm right."
--David Palmer (palmer(at)tybalt(dot)caltech(dot)edu)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Internet Wire 1998-05-14 19:35:49 Internet Wire
Previous Message D'Arcy J.M. Cain 1998-05-14 17:12:21 Re: [HACKERS] char(8) vs char8