Re: Shared memory

From: Thomas Hallgren <thomas(at)tada(dot)se>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PL/Java Development <Pljava-dev(at)gborg(dot)postgresql(dot)org>
Subject: Re: Shared memory
Date: 2006-03-28 15:48:00
Message-ID: 44295AB0.5060203@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pljava-dev

Hi Simon,
Thanks for your input. All good points. I actually did some work using Java stored
procedures on DB2 a while back but I had managed to forget (or repress :-) ) all about the
FENCED/NOT FENCED stuff. The current discussion definitely puts it in a different
perspective. I think PL/Java has a pretty good 'NOT FENCED' implementation, as does many
other PL's, but no PL has yet come up with a FENCED solution.

This FENCED/NOT FENCED terminology would be a good way to differentiate between the two
approaches. Any chance of that syntax making it into the PostgreSQL grammar, should the need
arise?

Some more comments inline:

Simon Riggs wrote:
> Just some thoughts from afar: DB2 supports in-process and out-of-process
> external function calls (UDFs) that it refers to as UNFENCED and FENCED
> procedures. For Java only, IBM have moved to supporting *only* FENCED
> procedures for Java functions, i.e. having a single JVM for all
> connections.
>
Are you sure about this? As I recall it a FENCED stored procedure executed in a remote JVM
of it's own. A parameter could be used that either caused a new JVM to be instantiated for
each stored procedure call or to be kept for the duration of the session. The former would
yield really horrible performance but keep memory utilization at a minimum. The latter would
get a more acceptable performance but waste more memory (in par with PL/Java today).

> Each connection's Java function runs as a thread on a
> single dedicated JVM-only process.
>
If that was true, then different threads could share dirty session data. I wanted to do that
using DB2 but found it impossible. That was a while back though.

> That approach definitely does increase the invocation time, but it
> significantly reduces the resources associated with the JVM, as well as
> allowing memory management to be more controllable (bliss...). So the
> overall picture could be more CPU and memory resources for each
> connection in the connection pool.
>
My very crude measurements indicate that the overhead of using a separate JVM is between
6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured
with 4GB RAM or more are not uncommon.

I'm not saying that the overhead doesn't matter. Of course it does. But the time when you
needed to be extremely conservative with memory usage has passed. It might be far less
expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead.

My point is, even fairly large app-servers (using connection pools with up to 200
simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based
server with 4GB RAM and show very good throughput with the current implementation.

> If you have a few small Java functions centralisation would not be good,
> but if you have a whole application architecture with many connections
> executing reasonable chunks of code then this can be a win.
>
One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses
JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the
overhead is not just limited to the actual call of the UDF, it's also imposed on all
database accesses that the UDF makes in turn.

> In that environment we used Java for major database functions, with SQL
> functions for small extensions.
>
My guess is that those major database functions did a fair amount of JDBC. Am I right?

> Also the Java invocation time we should be celebrating is that by having
> Java in the database the Java<->DB time is much less than it would be if
> we had a Java stack sitting on another server.
>

I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the
actual database are very common. One major reason being that you don't want network overhead
between the middle tier and the backend. Moving logic into the database instead of keeping
it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC.

Regards,
Thomas Hallgren

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-03-28 15:48:59 Re: Why are default encoding conversions
Previous Message Tom Lane 2006-03-28 15:36:38 Re: Tablespaces oddity?

Browse pljava-dev by date

  From Date Subject
Next Message Tom Lane 2006-03-28 16:38:03 Re: Shared memory
Previous Message Simon Riggs 2006-03-28 10:23:59 Re: Shared memory