Re: Regarding Postgres Dynamic Shared Memory (DSA)

From: Mahi Gurram <teckymahi(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Regarding Postgres Dynamic Shared Memory (DSA)
Date: 2017-06-16 13:17:37
Message-ID: CAGg=Gue94VZj1Hb37RBB0TDgzSSY-7sq=gSuqHdRdSoxh+3FCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

Thanks for your response and suggestions to change the code.

Now i have modified my code as per your suggestions. Now dsa_area pointer
is not in shared memory, it is a global variable. Also, implemented all
your code suggestions but unfortunately, no luck. Still facing the same
behaviour. Refer the attachment for the modified code.

I have some doubts in your response. Please clarify.

I didn't try your code but I see a few different problems here. Every
> backend is creating a new dsa area, and then storing the pointer to it
> in shared memory instead of attaching from other backends using the
> handle, and there are synchronisation problems. That isn't going to
> work. Here's what I think you might want to try:

Actually i'm not creating dsa_area for every backend. I'm creating it only
once(in BufferShmemHook).
* I put prints in my _PG_init and BufferShmemHook function to confirm the
same.

As far as i know, _PG_Init of a shared_library/extension is called only
once(during startup) by postmaster process, and all the postgres backends
are forked/child process to postmaster process.

Since the backends are the postmaster's child processes and are created
*after* the shared memory(dsa_area) has been created and attached, the
backend/child process will receive the shared memory segment in its address
space and as a result no shared memory operations like dsa_attach are
required to access/use dsa data.

Please correct me, if i'm wrong.

3. Whether you are the backend that created it or a backend that
> attached to it, I think you'll need to store the dsa_area in a global
> variable for your UDFs to access. Note that the dsa_area object will
> be different in each backend: there is no point in storing that
> address itself in shared memory, as you have it, as you certainly
> can't use it in any other backend. In other words, each backend that
> attached has its own dsa_area object that it can use to access the
> common dynamic shared memory area.

In case of forked processes, the OS actually does share the pages
initially, because fork implements copy-on-write semantics. which means
that provided none of the processes modifies the pages, they both points to
same address and the same data.

Based on above theory, assume i have created dsa_area object in postmaster
process(_PG_Init) and is a global variable, all the backends/forked
processes can able to access/share the same dsa_area object and it's
members.

Hence theoretically, the code should work with out any issues. But i'm sure
why it is not working as expected :(

I tried debugging by putting prints, and observed the below things:
1. dsa_area_control address is different among postmaster process and
backends.
2. After restarting, they seems to be same and hence it is working after
that.

2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process,
> after dsa_create() +++++
> 2017-06-16 18:08:50.798 IST [9195] LOG:
> *the address of dsa_area_control is 0x7f50ddaa6000*2017-06-16
> 18:08:50.798 IST [9195] LOG: *the dsa_area_handle is 1007561696*
> 2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in
> forked process +++++
> 2017-06-16 18:11:01.904 IST [9224] LOG:
> *the address of dsa_area_control is 0x1dac910*2017-06-16 18:11:01.904 IST
> [9224] LOG: *the dsa_area_handle is 0*
> 2017-06-16 18:11:01.907 IST [9195] LOG: server process (PID 9224) was
> terminated by signal 11: Segmentation fault
> 2017-06-16 18:11:01.907 IST [9195] DETAIL: Failed process was running:
> select test_dsa_data_access(1);
> 2017-06-16 18:11:01.907 IST [9195] LOG: terminating any other active
> server processes
> 2017-06-16 18:11:01.907 IST [9227] FATAL: the database system is in
> recovery mode
> 2017-06-16 18:11:01.907 IST [9220] WARNING: terminating connection
> because of crash of another server process
> 2017-06-16 18:11:01.907 IST [9220] DETAIL: The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2017-06-16 18:11:01.907 IST [9220] HINT: In a moment you should be able
> to reconnect to the database and repeat your command.
> 2017-06-16 18:11:01.907 IST [9195] LOG: all server processes terminated;
> reinitialising
> 2017-06-16 18:08:50.798 IST [9195] LOG: ++++ Inside Postmaster Process,
> after dsa_create() +++++
> 2017-06-16 18:11:01.937 IST [9195] LOG:
> *the address of dsa_area_control is 0x7f50ddaa6000*2017-06-16
> 18:11:01.937 IST [9195] LOG: *the dsa_area_handle is 1833840303*
> 2017-06-16 18:11:01.904 IST [9224] LOG: ++++ Inside UDF function in
> forked process +++++
> 2017-06-16 18:12:24.247 IST [9239] LOG:
> *the address of dsa_area_control is 0x7f50ddaa6000*2017-06-16
> 18:12:24.247 IST [9239] LOG: *the dsa_area_handle is 1833840303*

I may be wrong in my understanding, and i might be missing something :(

Please help me in sorting it out. Really appreciate for all your help :)

PS: In mac, It is working fine as expected. I'm facing this issue only in
linux systems. I'm working over postgres 10.1 beta FYI.

Thanks & Best Regards,
- Mahi

On Thu, Jun 15, 2017 at 5:00 PM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
> wrote:

> On Thu, Jun 15, 2017 at 6:32 PM, Mahi Gurram <teckymahi(at)gmail(dot)com> wrote:
> > Followed the same as per your suggestion. Refer the code snippet below:
> >
> >> void
> >> _PG_init(void){
> >> RequestAddinShmemSpace(100000000);
> >> PreviousShmemHook = shmem_startup_hook;
> >> shmem_startup_hook = BufferShmemHook;
> >> }
> >> void BufferShmemHook(){
> >> dsa_area *area;
> >> dsa_pointer data_ptr;
> >> char *mem;
> >> area = dsa_create(my_tranche_id());
> >> data_ptr = dsa_allocate(area, 42);
> >> mem = (char *) dsa_get_address(area, data_ptr);
> >> if (mem != NULL){
> >> snprintf(mem, 42, "Hello world");
> >> }
> >> bool found;
> >> shmemData = ShmemInitStruct("Mahi_Shared_Data",
> >> sizeof(shared_data),
> >> &found);
> >> shmemData->shared_area = area;
> >> shmemData->shared_area_handle = dsa_get_handle(area);
> >> shmemData->shared_data_ptr = data_ptr;
> >> shmemData->head=NULL;
> >> }
> >
> >
> > Wrote one UDF function, which is called by one of the client connection
> and
> > that tries to use the same dsa. But unfortunately it is behaving strange.
> >
> > First call to my UDF function is throwing segmentation fault and
> postgres is
> > quitting and auto restarting. If i try calling the same UDF function
> again
> > in new connection(after postgres restart) it is working fine.
> >
> > Put some prints in postgres source code and found that dsa_allocate() is
> > trying to use area->control(dsa_area_control object) which is pointing to
> > wrong address but after restarting it is pointing to right address and
> hence
> > it is working fine after restart.
> >
> > I'm totally confused and stuck at this point. Please help me in solving
> > this.
> >
> > PS: It is working fine in Mac.. in only linux systems i'm facing this
> > behaviour.
> >
> > I have attached the zip of my extension code along with screenshot of the
> > pgclient and log file with debug prints for better understanding.
> > *logfile is edited for providing some comments for better understanding.
> >
> > Please help me in solving this.
>
> Hi Mahi
>
> I didn't try your code but I see a few different problems here. Every
> backend is creating a new dsa area, and then storing the pointer to it
> in shared memory instead of attaching from other backends using the
> handle, and there are synchronisation problems. That isn't going to
> work. Here's what I think you might want to try:
>
> 1. In BufferShmemHook, acquire and release AddinShmemInitLock while
> initialising "Mahi_Shared_Data" (just like pgss_shmem_startup does),
> because any number of backends could be starting up at the same time
> and would step on each other's toes here.
>
> 2. When ShmemInitStruct returns, check the value of 'found'. If it's
> false, then this backend is the very first one to attach to this bit
> of (traditional) shmem. So it should create the DSA area and store
> the handle in the traditional shmem. Because we hold
> AddinShmemInitLock we know that no one else can be doing that at the
> same time. Before even trying to create the DSA area, you should
> probably memset the whole thing to zero so that if you fail later, the
> state isn't garbage. If 'found' is true, then we know it's already
> all set up (or zeroed out), so instead of creating the DSA area it
> should attach to it using the published handle.
>
> 3. Whether you are the backend that created it or a backend that
> attached to it, I think you'll need to store the dsa_area in a global
> variable for your UDFs to access. Note that the dsa_area object will
> be different in each backend: there is no point in storing that
> address itself in shared memory, as you have it, as you certainly
> can't use it in any other backend. In other words, each backend that
> attached has its own dsa_area object that it can use to access the
> common dynamic shared memory area.
>
> 4. After creating, in this case I think you should call
> dsa_pin(area), so that it doesn't go away when there are no backends
> attached (ie because there are no backends running) (if I understand
> correctly that you want this DSA area to last as long as the whole
> cluster).
>
> By the way, in _PG_init() where you have
> RequestAddinShmemSpace(100000000) I think you want
> RequestAddinShmemSpace(sizeof(shared_data)).
>
> The key point is: only one backend should use LWLockNewTrancheId() and
> dsa_create(), and then make the handle available to others; all the
> other backends should use dsa_attach(). Then they'll all be attached
> to the same dynamic shared memory area and can share data.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

Attachment Content-Type Size
test_dsa_new.zip application/zip 9.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuan Dong 2017-06-16 13:41:28 答复: GiST API Adancement
Previous Message Константин Евтеев 2017-06-16 13:13:36 Re: BUG #14699: Statement trigger and logical replication