Re: Copy data to DSA area

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Copy data to DSA area
Date: 2018-11-07 04:35:18
Message-ID: CAEepm=1x94bnGM4MUZANYhU5rnZEP9ZLp3yXwXuniWRp=VRepw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 7, 2018 at 3:34 PM Ideriha, Takeshi
<ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com> wrote:
> Related to my development (putting relcache and catcache onto shared memory)[1],
>
> I have some questions about how to copy variables into shared memory, especially DSA area, and its implementation way.
>
> Under the current architecture when initializing some data, we sometimes copy certain data using some specified functions
>
> such as CreateTupleDescCopyConstr(), datumCopy(), pstrdup() and so on. These copy functions calls palloc() and allocates the
>
> copied data into current memory context.

Yeah, I faced this problem in typcache.c, and you can see the function
share_typledesc() which copies TupleDesc objects into a DSA area.
This doesn't really address the fundamental problem here though... see
below.

> But on the shared memory, palloc() cannot be used. Now I'm trying to use DSA (and dshash) to handle data on the shared memory
>
> so for example dsa_allocate() is needed instead of palloc(). I hit upon three ways to implementation.
>
> A. Copy existing functions and write equivalent DSA version copy functions like CreateDSATupleDescCopyConstr(),
>
> datumCopyDSA(), dsa_strdup()
>
> In these functions the logic is same as current one but would be replaced palloc() with dsa_allocate().
>
> But this way looks too straight forward and code becomes less readable and maintainable.
>
> B. Using current functions and copy data on local memory context temporarily and copy it again to DSA area.
>
> This method looks better compared to the plan A because we don't need to write clone functions with copy-paste.
>
> But copying twice would degrade the performance.

It's nice when you can construct objects directly at an address
supplied by the caller. In other words, if allocation and object
initialisation are two separate steps, you can put the object anywhere
you like without copying. That could be on the stack, in an array,
inside another object, in regular heap memory, in traditional shared
memory, in a DSM segment or in DSA memory. I asked for an alloc/init
separation in the Bloom filter code for the same reason. But this
still isn't the real problem here...

> C. Enhance the feature of palloc() and MemoryContext.
>
> This is a rough idea but, for instance, make a new global flag to tell palloc() to use DSA area instead of normal MemoryContext.
>
> MemoryContextSwitchToDSA(dsa_area *area) indicates following palloc() to allocate memory to DSA.
>
> And MemoryContextSwitchBack(dsa_area) indicates to palloc is used as normal one.
>
> MemoryContextSwitchToDSA(dsa_area);
>
> palloc(size);
>
> MemoryContextSwitchBack(dsa_area);
>
> Plan C seems a handy way for DSA user because people can take advantage of existing functions.

The problem with plan C is that palloc() has to return a native
pointer, but in order to do anything useful with this memory (ie to
share it) you also need to get the dsa_pointer, but the palloc()
interface doesn't allow for that. Any object that holds a pointer
allocated with DSA-hiding-behind-palloc() will be useless for another
process.

> What do you think about these ideas?

The real problem is object graphs with pointers. I solved that
problem for TupleDesc by making them *almost* entirely flat, in commit
c6293249. I say 'almost' because it won't work for constraints or
defaults, but that didn't matter for the typcache.c case because it
doesn't use those. In other words I danced carefully around the edge
of the problem.

In theory, object graphs, trees, lists etc could be implemented in a
way that allows for "flat" storage, if they can be allocated in
contiguous memory and refer to sub-objects within that space using
offsets from the start of the space, and then they could be used
without having to know whether they are in DSM/DSA memory or regular
memory. That seems like a huge project though. Otherwise they have
to hold dsa_pointer, and deal with that in many places. You can see
this in the Parallel Hash code. I had to make the hash table data
structure able to deal with raw pointers OR dsa_pointer. That's would
be theoretically doable, but really quite painful, for the whole
universe of PostgreSQL node types and data structures.

I know of 3 ideas that would make your idea C work, so that you could
share something as complicated as a query plan directly without having
to deserialise it to use it:

1. Allow the creation of DSA areas inside the traditional fixed
memory segment (instead of DSM), in a fixed-sized space reserved by
the postmaster. That is, use dsa.c's ability to allocate and free
memory, and possibly free a whole area at once to avoid leaking memory
in some cases (like MemoryContexts), but in this mode dsa_pointer
would be directly castable to a raw pointer. Then you could provide a
regular MemoryContext interface for it, and use it via palloc(), as
you said, and all the code that knows how to construct lists and trees
and plan nodes etc would All Just Work. It would be your plan C, and
all the pointers would be usable in every process, but limited in
total size at start-up time.

2. Allow regular DSA in DSM to use raw pointers into DSM segments, by
mapping segments at the same address in every backend. This involves
reserving a large virtual address range up front in the postmaster,
and then managing the space, trapping SEGV to map/unmap segments into
parts of that address space as necessary (instead of doing that in
dsa_get_address()). AFAIK that would work, but it seems to be a bit
weird to go to such lengths. It would be a kind of home-made
simulation of threads. On the other hand, that is what we're already
doing in dsa.c, except more slowly due to extra software address
translation from funky pseudo-addresses.

3. Something something threads.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-11-07 04:49:15 Re: First-draft release notes for back-branch releases
Previous Message Jonathan S. Katz 2018-11-07 04:08:39 Re: First-draft release notes for back-branch releases