Re: pl/Ruby, deprecating plPython and Core

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Thomas Hallgren <thhal(at)mailblocks(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, David Fetter <david(at)fetter(dot)org>
Subject: Re: pl/Ruby, deprecating plPython and Core
Date: 2005-08-17 18:32:58
Message-ID: B8708B2C-7071-45FB-8BCC-C3665C5C2E4B@fastcrypt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 17-Aug-05, at 12:40 PM, Thomas Hallgren wrote:

> Andrew Dunstan wrote:
>
>> Dave Cramer wrote:
>>
>>> As there are two java procedural languages which are available
>>> for postgreSQL Josh asked for an explanation as to their
>>> differences.
>>> They are quite similar in that both of them run the function in
>>> a java vm, and are pre-compiled. Neither attempt to compile the
>>> code.
>>>
>>> The biggest difference is how they connect to the java VM.
>>>
>>> PL/Java uses Java Native Interfaces (JNI) and does a direct call
>>> into the java VM from the language handler.
>>>
>>> PL-J uses a network protocol to connect to a java VM.
>>>
>>>
>>> There are advantages and disadvantages to both approaches.
>>>
>>> + JNI is simpler, doesn't require a protocol, or an application
>>> container to manage the User Defined Functions
>>> - JNI requires that the vm runs on the server machine, and a
>>> separate vm be instantiated for every connection that calls a
>>> function.
>>> This is mitigated somewhat in java 1.5, by sharing data,
>>> however this may or may not be a Sun only feature ( does anyone
>>> know );
>>> either way a separate vm is required for each connection.
>>> - startup time for the vm on the first call for the connection.
>>> - Possible ( not as likely any more ) for the java VM to take
>>> the server down.
>>>
>>> Using a network protocol such as a pl-j does has the following
>>> ( basically the opposite of the JNI (dis)advantages )
>>>
>>> + The java VM does not have to run on the server.
>>> + Only one vm per server
>>> - More complex, requires a micro kernel application server to
>>> manage the UDF's currently http://loom.codehaus.org/
>>>
>>>
>>>
> I think Dave miss a couple of important points.
>
> 1. Speed. One major reason for moving code from the middle tier
> down to the database is that you want to execute the code close to
> the actual persistence mechanisms in order to minimize network
> traffic and maximize throughput.
I think until there are actual benchmarks, there are too many
variables here to suggest one is faster than the other. The overhead
of having multiple java vm's is not easily estimated. Even with a
connection pool, consider the memory footprint of even 10 java VM's
>
> 2. A growing percentage of db-clients utilize some kind of
> connection pool (an overwelming amount of the java clients certanly
> do), which minimizes the problem with startup times.
>
> 3. Transaction visiblity. A function that in turn issues new SQL
> calls must do that wihtin the scope of the caller transaction. A
> remote process must hence call back into it's caller. PL/Java has
> its own JDBC driver that interacts directly with SPI.
PL-J maintains transaction visibility, it has it's own JDBC driver as
well. The protocol between the language handler and the java portion
is based upon the FE/BE protocol which made it easy to use pg's JDBC
driver with some modification.
>
> 4. Isolation. Using separate VM's, instabilities in the VM can only
> affect one single connecton. One VM can be debugged or monitored
> without affecting the others. No data can be inadvertidely moved
> between connections, etc.
Loom deals with data integrity, debugging would have to be done by a
remote debug connection and can connect to any thread.
>
> I try to shed more light on the pros and cons here: http://
> gborg.postgresql.org/project/pljava/genpage.php?jni_rationale
>
>
>> That's a pretty good explanation and ought to be published more
>> widely. It's almost a pity that we couldn't have one project with
>> a server setting saying how we want it to run.
>>
> There are a couple of reasons that make me a bit reluctant to join
> the projects:
>
> PL/Java have no dependencies at all besides a Java Runtime
> Environment (or GCJ). PL/J reqires a fair amount of other modules
> just to compile.
PL-J requires one other module, which the build environment will
fetch automatically to compile.
>
> PL/Java is at release 1.1 and have a community of users. To my
> knowledge, PL/J has not reached its first release yet.
>
> PL/Java and PL/J use completely different approaches and share
> almost no code. The code that we do share (public interfaces, manly
> for trigger management) is published at the Maven repository at
> ibiblio.org.
>
> I think it's better to keep the two projects separate. But I also
> think that it is extremely important that we ensure that the user
> experience is similar for both projects so that there's nothing to
> prevent a server setting that decides which one to use provided
> both are present.
>
> Kind regards,
> Thomas Hallgren
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-08-17 18:48:09 Re: Missing CONCURRENT VACUUM (Was: Release notes for
Previous Message Martijn van Oosterhout 2005-08-17 18:25:43 Re: SPI: ERROR: no snapshot has been set