Re: PostgreSQL in the press again

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-advocacy(at)postgresql(dot)org
Subject: Re: PostgreSQL in the press again
Date: 2004-11-14 07:05:21
Message-ID: m3u0rskj0u.fsf@knuth.knuth.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy

Oops! scrappy(at)postgresql(dot)org ("Marc G. Fournier") was seen spray-painting on a wall:
> On Sat, 13 Nov 2004, Thomas Hallgren wrote:
>
>> Joshua D. Drake wrote:
>>> Yes but I believe even you would agree that their are programming
>>> languages that are better for certain tasks than others. The use
>>> of java as a replication engine for PostgreSQL seems,
>>> well... incorrect.
>>
>> Marc G. Fournier wrote:
>>> We definitely concur with that, which is why we are re-writing it
>>> ... going to Java, as Andrew has mentioned, was *not* a design
>>> decision that we made, but was made for us :(
>>>
>> Now I get really curious. Why would Java be a bad choice for a
>> replication engine? I would consider it an excellent choice,
>> provided of course that the people tasked with the implementation
>> had the right skills. C-JDBC for instance, is written in Java.
>
> Everyone obviously has their opinion, but in mine, Java just has
> toooooo large of a memory foot print ... I don't know enough about
> Java to know if this is something that is restricted to how
> eRServer/Java was coded or not, but by default, the damn thing takes
> something like 300Mb of RAM for just the engine :(

The problem with Java is twofold:

1. Naive system implementations wind up gratuitously using a lot of
memory.

2. The garbage collection system makes it particularly difficult to
be aware of how the "memory life cycle" works. Which helps keep
developers naive for somewhat longer...

In the case of eRServer, the way the snapshot system was constructed
led to "gratuitous memory use," and that's not an obvious result of
either 1. or 2.

Someone could have made a C-based version of ERS that, by using
similar implementation strategies, would also use "gratuitously large"
amounts of memory.

In contrast, Slony-I happens to be _immensely_ more frugal in its use
of memory. That is a matter of design, not of the language used. The
"strategy" involves loading into memory only the "buffering" (more or
less) of the data that is being loaded. If there's a replication set
consisting of 80GB of data, you don't need to hold it all in RAM; you
just need to buffer a few hundred KB of it so that you're streaming
large enough blocks across the network to let the network connections
be used efficiently. If the strategies of Slony-I had been
implemented in Java, the memory footprint would still be relatively
small. The fact that Java has heftier libraries than C means that
Java apps will be somewhat bigger than C ones.

But I wouldn't raise any "red flags" if a "Slony-Java" process
consumed 25MB whilst the C version only consumed 8MB. Those are both
small enough sizes that they're not going to challenge JVM maximum
memory sizes. On a couple occasions, I saw eRServer "blow up" due to
the JVM not being configured to have enough memory space, and could
foresee situations where you couldn't set memory space high enough
:-(.

I'd expect a C++-based system to fall somewhere in between. Between
exception handling, templates, and such, C++ adds a bit of "gratuitous
bloat," but not quite so much as in Java. (Unless you use STL Way
Lots, but that's another story :-).)

But in all of this, the things that cause the _real_ bloat are
pessimal algorithmic design choices. The things to _fix_ bloat are
algorithmic changes, not changes of language.

The things to "hate" about Java aren't about any of this. It's more
like:

- Java runs, in a "supportable" manner, on way fewer platforms than
PostgreSQL

- If you pick libraries that are functional enough to be useful,
then you likely have to get a Sun JDK with pretty proprietary
licensing

- Due to licensing complexities, it's WAY more complex to deploy
Java-based apps than C-based apps. The average Linux or BSD
distribution contains hundreds if not thousands of apps
deployed in C; doing the same for Java has proved more than
troublesome.
--
output = reverse("gro.mca" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/linux.html
"Using Java as a general purpose application development language is
like going big game hunting armed with Nerf weapons."
-- Author Unknown

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Christopher Browne 2004-11-14 07:11:48 Re: PostgreSQL in the press again
Previous Message Joshua D. Drake 2004-11-14 05:38:09 Re: PostgreSQL in the press again