Re: server-side extension in c++

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Igor <igor(at)carcass(dot)ath(dot)cx>, pgsql-general(at)postgresql(dot)org
Subject: Re: server-side extension in c++
Date: 2010-06-02 16:20:23
Message-ID: 4C0684C7.8020002@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/06/2010 11:49 PM, Peter Geoghegan wrote:
> On 2 June 2010 13:36, Craig Ringer<craig(at)postnewspapers(dot)com(dot)au> wrote:
>>>
>>> Really? That seems like an *incredibly* arduous requirement.
>>> Intuitively, I find it difficult to believe. After all, even though
>>> using longjmp in C++ code is a fast track to undefined behaviour, I
>>> would have imagined that doing so in an isolated C module with a well
>>> defined interface, called from C++ would be safe.
>>
>> Not necessarily. It's only safe if setjmp/longjmp calls occur only
>> within the C code without "breaking" call paths involving C++.
>
> It isn't obvious to me that your suggestion that C++ functions that
> invoke jumping pg code use only POD types, but manipulate C++ types
> through pointers helps much, or at all. RAII/SBRM is just another
> memory management strategy (albeit a very effective, intuitive one).
> It's basically equivalent to the compiler generating calls to a
> constructor when an object is instantiated, and to a destructor when
> the object goes out of scope.

... and use of longjmp completely breaks scoping rules, but doesn't
inherently violate other program flow expectations.

> So, how your concern fundamentally
> differs from the general case where we're managing resources (but not
> through memory contexts/palloc) explicitly, and risk being cut off
> before control flow reaches our (implicit or explicit) destructor call
> isn't clear, except perhaps that RAII gives clients what may be a
> false sense of security. Sure, one is technically undefined behaviour
> while the other isn't, but the end result is probably identical - a
> memory leak.

Except that Pg, via palloc, offers a way to clean up a whole memory
context. Ensuring you delete your C++ object graph (probably via a few
opaque pointers you pass around in the C code) when a MemoryContext is
deleted isn't hard. palloc's MemoryContextMethods->delete_context
provides just what's required. It's no different to what you do in a
normal extension written in C, except that your deleteMyObject(somePtr)
call happens to be an "extern C" function written in C++ that delete()s
the ptr. No biggie.

You can't do that if you're relying on smart pointers, refcounting,
std::auto_ptr, etc because they're broken by longjmp, dtors won't get
called when they should, you'll think objects are still referenced when
they aren't, and things generally fail.

It's even worse if you're relying on stack-based objects with dtors for
lock management or the like.

> Yes, but my point was that if that occurs above the C++ code, it will
> never be affected by it. We have to longjmp() *over* C++ code before
> we have a problem.

Sure, as per the example I posted.

> Re-implement global operator new() and friends in terms of palloc and
> pfree. This sort of thing is often done for C++ application
> frameworks.

... and regularly causes headaches :S

> It makes me queasy that by doing this, we're resorting to undefined
> behaviour in terms of the C++ standard (destructors are never called)
> as a matter of routine.

Well, if it was done. I really, really would't want to do it for just
those reasons - I've never liked placement new, overriding operator
new(), etc for those reasons.

It's not too tricky to just free your C++ object graph when a
MemoryContext goes out of scope, as MemoryContexts have their own
dtor-equivalents that're reliably called by Pg irrespective of
setjmp/longjmp-based program flow. Why make it more complicated than it
has to be? This way your dtors get called reliably at destruction.

That said, if I was to do that in code I was writing, I'd build a pool
allocator based on a memory context that handed out palloc'd chunks...
and I'd just give up on destructors for those objects.

http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10
http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.14

> What do you think? I suppose that such
> undefined behaviour is absolutely intolerable. It's not a serious
> suggestion, just something that I think is worth pointing out.

That stuff is cool, but rarely worth the complexity because it breaks
pretty basic assumptions about how things work. I prefer to just keep my
C and C++ code cleanly separated where possible, and stick to a very
simple subset of C++ where I can't keep them separate.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-06-02 16:23:03 Re: libreadline and Debian 5 - not missing just badly named
Previous Message Thom Brown 2010-06-02 16:02:05 Re: Detecting if the DB is in backup mode or not