Re: Regarding Postgres Dynamic Shared Memory (DSA)

From: Mahendranath Gurram <mahendranath(at)zohocorp(dot)com>
To: "Mahi Gurram" <teckymahi(at)gmail(dot)com>
Cc: "Thomas Munro" <thomas(dot)munro(at)enterprisedb(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Regarding Postgres Dynamic Shared Memory (DSA)
Date: 2017-06-20 09:46:19
Message-ID: 15cc4e54751.c34fdaa87199.2745134041722427880@zohocorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

Any update on this?

Please let me know how can i proceed further.

Thanks &amp; Best Regards,

-Mahi

---- On Fri, 16 Jun 2017 18:47:37 +0530 Mahi Gurram &lt;teckymahi(at)gmail(dot)com&gt; wrote ----

Hi Thomas,

Thanks for your response and suggestions to change the code.

Now i have modified my code as per your suggestions. Now dsa_area pointer is not in shared memory, it is a global variable. Also, implemented all your code suggestions but unfortunately, no luck. Still facing the same behaviour. Refer the attachment for the modified code.

I have some doubts in your response. Please clarify.

I didn't try your code but I see a few different problems here. Every
backend is creating a new dsa area, and then storing the pointer to it
in shared memory instead of attaching from other backends using the
handle, and there are synchronisation problems. That isn't going to
work. Here's what I think you might want to try:

Actually i'm not creating dsa_area for every backend. I'm creating it only once(in BufferShmemHook).

* I put prints in my _PG_init and BufferShmemHook function to confirm the same.

As far as i know, _PG_Init of a shared_library/extension is called only once(during startup) by postmaster process, and all the postgres backends are forked/child process to postmaster process.

Since the backends are the postmaster's child processes and are created after the shared memory(dsa_area) has been created and attached, the backend/child process will receive the shared memory segment in its address space and as a result no shared memory operations like dsa_attach are required to access/use dsa data.

Please correct me, if i'm wrong.

3. Whether you are the backend that created it or a backend that
attached to it, I think you'll need to store the dsa_area in a global
variable for your UDFs to access. Note that the dsa_area object will
be different in each backend: there is no point in storing that
address itself in shared memory, as you have it, as you certainly
can't use it in any other backend. In other words, each backend that
attached has its own dsa_area object that it can use to access the
common dynamic shared memory area.

In case of forked processes, the OS actually does share the pages initially, because fork implements copy-on-write semantics. which means that provided none of the processes modifies the pages, they both points to same address and the same data.

Based on above theory, assume i have created dsa_area object in postmaster process(_PG_Init) and is a global variable, all the backends/forked processes can able to access/share the same dsa_area object and it's members.

Hence theoretically, the code should work with out any issues. But i'm sure why it is not working as expected :(

I tried debugging by putting prints, and observed the below things:

1. dsa_area_control address is different among postmaster process and backends.

2. After restarting, they seems to be same and hence it is working after that.

2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++

2017-06-16 18:08:50.798 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:08:50.798 IST [9195] LOG: the dsa_area_handle is 1007561696
2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++

2017-06-16 18:11:01.904 IST [9224] LOG: the address of dsa_area_control is 0x1dac910
2017-06-16 18:11:01.904 IST [9224] LOG: the dsa_area_handle is 0
2017-06-16 18:11:01.907 IST [9195] LOG: server process (PID 9224) was terminated by signal 11: Segmentation fault

2017-06-16 18:11:01.907 IST [9195] DETAIL: Failed process was running: select test_dsa_data_access(1);

2017-06-16 18:11:01.907 IST [9195] LOG: terminating any other active server processes

2017-06-16 18:11:01.907 IST [9227] FATAL: the database system is in recovery mode

2017-06-16 18:11:01.907 IST [9220] WARNING: terminating connection because of crash of another server process

2017-06-16 18:11:01.907 IST [9220] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2017-06-16 18:11:01.907 IST [9220] HINT: In a moment you should be able to reconnect to the database and repeat your command.

2017-06-16 18:11:01.907 IST [9195] LOG: all server processes terminated; reinitialising

2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process, after dsa_create() +++++

2017-06-16 18:11:01.937 IST [9195] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:11:01.937 IST [9195] LOG: the dsa_area_handle is 1833840303
2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in forked process +++++

2017-06-16 18:12:24.247 IST [9239] LOG: the address of dsa_area_control is 0x7f50ddaa6000
2017-06-16 18:12:24.247 IST [9239] LOG: the dsa_area_handle is 1833840303

I may be wrong in my understanding, and i might be missing something :(

Please help me in sorting it out. Really appreciate for all your help :)

PS: In mac, It is working fine as expected. I'm facing this issue only in linux systems. I'm working over postgres 10.1 beta FYI.

Thanks &amp; Best Regards,

- Mahi

On Thu, Jun 15, 2017 at 5:00 PM, Thomas Munro &lt;thomas(dot)munro(at)enterprisedb(dot)com&gt; wrote:

On Thu, Jun 15, 2017 at 6:32 PM, Mahi Gurram &lt;teckymahi(at)gmail(dot)com&gt; wrote:

&gt; Followed the same as per your suggestion. Refer the code snippet below:

&gt;

&gt;&gt; void

&gt;&gt; _PG_init(void){

&gt;&gt; RequestAddinShmemSpace(100000000);

&gt;&gt; PreviousShmemHook = shmem_startup_hook;

&gt;&gt; shmem_startup_hook = BufferShmemHook;

&gt;&gt; }

&gt;&gt; void BufferShmemHook(){

&gt;&gt; dsa_area *area;

&gt;&gt; dsa_pointer data_ptr;

&gt;&gt; char *mem;

&gt;&gt; area = dsa_create(my_tranche_id());

&gt;&gt; data_ptr = dsa_allocate(area, 42);

&gt;&gt; mem = (char *) dsa_get_address(area, data_ptr);

&gt;&gt; if (mem != NULL){

&gt;&gt; snprintf(mem, 42, "Hello world");

&gt;&gt; }

&gt;&gt; bool found;

&gt;&gt; shmemData = ShmemInitStruct("Mahi_Shared_Data",

&gt;&gt; sizeof(shared_data),

&gt;&gt; &amp;found);

&gt;&gt; shmemData-&gt;shared_area = area;

&gt;&gt; shmemData-&gt;shared_area_handle = dsa_get_handle(area);

&gt;&gt; shmemData-&gt;shared_data_ptr = data_ptr;

&gt;&gt; shmemData-&gt;head=NULL;

&gt;&gt; }

&gt;

&gt;

&gt; Wrote one UDF function, which is called by one of the client connection and

&gt; that tries to use the same dsa. But unfortunately it is behaving strange.

&gt;

&gt; First call to my UDF function is throwing segmentation fault and postgres is

&gt; quitting and auto restarting. If i try calling the same UDF function again

&gt; in new connection(after postgres restart) it is working fine.

&gt;

&gt; Put some prints in postgres source code and found that dsa_allocate() is

&gt; trying to use area-&gt;control(dsa_area_control object) which is pointing to

&gt; wrong address but after restarting it is pointing to right address and hence

&gt; it is working fine after restart.

&gt;

&gt; I'm totally confused and stuck at this point. Please help me in solving

&gt; this.

&gt;

&gt; PS: It is working fine in Mac.. in only linux systems i'm facing this

&gt; behaviour.

&gt;

&gt; I have attached the zip of my extension code along with screenshot of the

&gt; pgclient and log file with debug prints for better understanding.

&gt; *logfile is edited for providing some comments for better understanding.

&gt;

&gt; Please help me in solving this.

Hi Mahi

I didn't try your code but I see a few different problems here. Every

backend is creating a new dsa area, and then storing the pointer to it

in shared memory instead of attaching from other backends using the

handle, and there are synchronisation problems. That isn't going to

work. Here's what I think you might want to try:

1. In BufferShmemHook, acquire and release AddinShmemInitLock while

initialising "Mahi_Shared_Data" (just like pgss_shmem_startup does),

because any number of backends could be starting up at the same time

and would step on each other's toes here.

2. When ShmemInitStruct returns, check the value of 'found'. If it's

false, then this backend is the very first one to attach to this bit

of (traditional) shmem. So it should create the DSA area and store

the handle in the traditional shmem. Because we hold

AddinShmemInitLock we know that no one else can be doing that at the

same time. Before even trying to create the DSA area, you should

probably memset the whole thing to zero so that if you fail later, the

state isn't garbage. If 'found' is true, then we know it's already

all set up (or zeroed out), so instead of creating the DSA area it

should attach to it using the published handle.

3. Whether you are the backend that created it or a backend that

attached to it, I think you'll need to store the dsa_area in a global

variable for your UDFs to access. Note that the dsa_area object will

be different in each backend: there is no point in storing that

address itself in shared memory, as you have it, as you certainly

can't use it in any other backend. In other words, each backend that

attached has its own dsa_area object that it can use to access the

common dynamic shared memory area.

4. After creating, in this case I think you should call

dsa_pin(area), so that it doesn't go away when there are no backends

attached (ie because there are no backends running) (if I understand

correctly that you want this DSA area to last as long as the whole

cluster).

By the way, in _PG_init() where you have

RequestAddinShmemSpace(100000000) I think you want

RequestAddinShmemSpace(sizeof(shared_data)).

The key point is: only one backend should use LWLockNewTrancheId() and

dsa_create(), and then make the handle available to others; all the

other backends should use dsa_attach(). Then they'll all be attached

to the same dynamic shared memory area and can share data.

--

Thomas Munro

http://www.enterprisedb.com

--

Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-06-20 09:50:04 Missing comment for ResultRelInfo in execnodes.h
Previous Message Amit Langote 2017-06-20 09:05:41 Re: Rules on table partitions