Remote PL/Java, Summary

From: Thomas Hallgren <thomas(at)tada(dot)se>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Remote PL/Java, Summary
Date: 2006-04-01 10:03:24
Message-ID: 442E4FEC.9050304@tada.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,
And thanks for very good input regarding a remote alternative to PL/Java
(thread titled "Shared Memory"). I'm convinced that such an alternative
would be a great addition to PL/Java and increase the number of users.
The work to create such a platform that has the stability and quality of
todays PL/Java is significant (I really do think it is a
production-grade product today). So significant in fact, that I'm
beginning to think of a third alternative. An alternative that would
combine the performance of using in-process calls with the benefits of
sharing a JVM. The answer is of course to make the backend multi-threaded.

This question has been debated before and always promptly rejected. One
major reason is of course that it will not bring any benefits over the
current multi-process approach on a majority of the platforms where
PostgreSQL is used. A process-switch is just as fast as a thread-switch
on Linux based systems. Over the last year however, something has happen
that certainly speaks in the favor of multi-threading. PostgreSQL is
getting widely adopted on Windows. On Windows, a process-switch is at
least 5 times more expensive then a thread-switch. In order to
appropriate locking, PostgreSQL is forced to do a fair amount of
switching during transaction processing so the gain in using a
multi-threaded approach on Windows is probably significant. The same is
true for other OS'es where process-switching is relatively expensive.

There are other benefits as well. PostgreSQL would no longer need shared
memory and semaphores and lot more resources could be shared between
backend processes. The one major drawback of a multi-threaded approach
(the one that's been the main argument for the defenders of the current
approach) is vulnerability. If one thread is messing things up, then the
whole system will be brought to a halt (on the other hand, that can be
said about the current shared-memory approach as well). The cure for
this is to have a system that, to the extent possible, prevents this
from happening. How would that be possible? Well, such systems are
widely used today. Huge companies use them in mission critical
applications all over the world. They are called Virtual Machines. Two
types in particular are gaining more an more ground. The .NET based CLR
and the Java VM.

Although there's an Open Source initiative called Mono that implements
the CLR, I still don't see it as a viable alternative to create a
production-grade multi-platform database. Microsofts CLR is of course
confined to Microsoft platforms. The Java VM's are however a different
matter altogether. And with the java.nio.channels package that was
introduced in Java 1.4 and the java.util.concurrent package from Java
5.0, Java has taken a major steps forward in being a very feasible
platform for a database implementation. There's actually nothing
stopping you from doing a high-performance MVCC system using Java today.
A SQL parser would be based on JavaCC technology (the grammar is already
written although it needs small adjustments to comply with the
PostgreSQL dialect). Lots of technology is there out-of-the-box such as
regular expressions, hash-maps, linked lists, etc. Not to forget an
exceptionally great threading system, now providing atomic operations,
semaphores, copy-on-write arrays etc. In short, everything that a
database implementor could ever wish for.

The third alternative for PL/Java, an approach that gets more viable
every minute I think about it, is to implement the PostgreSQL backend
completely in Java. I'm involved in the development of one of the
commercial JVM's. I know that an enormous amount of resources are
constantly devoted to performance optimizations. The days when a complex
system written in C or C++ could outperform a JVM have passed. A static
optimizer can only do so well. A JVM, that collects heuristics,
communicates with the CPU about cache usage etc., can be a great deal
smarter on how the final machine code will be optimized, and
re-optimized should the conditions change. It would be great if
PostgreSQL could benefit from all this research.

If a commercial JVM is perceived as a problem, then combine^h^h^hpile
the code with GNU gcj instead of gcc like today.

The list of advantages can be made a mile long. There's no point in
listing everything here. From my own standpoint, I'm of course thinking
first and foremost about the advantages with PL/Java. It will become the
absolute most efficient PL of them all. Other languages, for which no
good Java implementation exists (I'm thinking Jython for Python, etc.),
can be implemented using JNI. The most common functions used by say,
PL/Perl could probably be implemented as callbacks into the Java domain
in order to make the changes in the respective PL minimal.

Opinions? Suggestions?

Kind Regards,
Thomas Hallgren

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2006-04-01 12:34:21 Re: [GENERAL] PANIC: heap_update_redo: no block
Previous Message Tom Lane 2006-04-01 04:02:28 Re: Suggestion: Which Binary?