Re: Article on MySQL vs. Postgres

From: Benjamin Adida <ben(at)mit(dot)edu>
To: Tim Perdue <tperdue(at)valinux(dot)com>
Cc: <pgsql-hackers(at)hub(dot)org>
Subject: Re: Article on MySQL vs. Postgres
Date: 2000-07-05 14:48:01
Message-ID: B588C2E1.683C%ben@mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

on 7/4/00 8:30 PM, Tim Perdue at tperdue(at)valinux(dot)com wrote:

> This is for what most people do with PHP and databases - run
> semi-critical medium-traffic sites. Anyone running a mission-critical
> site would have to look elsewhere for true robustness. I would not at
> this time recommend any serious, life-threatening app run On either
> database.

To the person who owns the web site, data is always critical. Does
www.yahoo.com store "life-threatening" information? Not really, but if you
lose your yahoo.com email, the "oh sorry, our database doesn't support
transactions" excuse doesn't cut it.

> I took a real-world page from our site
> <http://sourceforge.net/forum/forum.php?forum_id=1> and made it portable
> to both databases. Of course, I could not import the "body" of the
> message into postgres because of the 8k limitation, so the body had to
> be dropped from both databases.
>
> The "nested" view of this page requires joins against three tables and
> some recursion to show submessages.

Some recursion? That is interesting. Do you mean multiple queries to the
database? I don't see any reason to have multiple queries to the database to
show nested messages in a forum. Using stored procedures to create sort keys
at insertion or selection time is the efficient way to do this. Ah, but
MySQL doesn't have stored procedures.

>> Did you use connection pooling (a lot
>
> I used persistent connections, yes. Without them, Postgres' showing was
> far poorer, with mysql showing about 2x the performance.

Well, there must be some issue with your setup, because 10 requests per
second on Postgres on reads only is far from what I've seen on much wimpier
boxes than yours. Maybe I should look some more into how pconnect really
handles connection pooling, I have heard bad things that need to be
verified.

> I'd really love to see a case where a real-world page view requires 4x
> the queries on MySQL. If you are doing subselects like that on a website
> in real-time you've got serious design problems and postgres would
> fold-up and quit under the load anyway.

I believe the "design problems" come up if you need subselects and you're
using MySQL. I've used Illustra/Informix, Oracle, and now Postgres to build
database-backed web sites, and subselects are a vital part of any
somewhat-complex web app. How exactly do subselects constitute a design
problem in your opinion?

> Specifically, what is the problem with my "intuition"? All I did in the
> prior message was report my results and ask for feedback before I post
> it.

Your intuition is that Postgres will be slower because it is slower than
MySQL at reads. I contend that:
- Postgres 7.0 is much faster at reads than the numbers you've shown.
I've seen it be much faster on smaller boxes.
- The slowdown you're seeing is probably due in no small part to the
implementation of pconnect(), the number of times it actually connects vs.
the number of times it goes to the pool, how large that pool gets, etc...
- The write-inefficiencies of MySQL will, on any serious web site, cut
performance so significantly that it is simply not workable. I'm thinking of
the delayed updates on Slashdot, the 20-25 second page loads on SourceForge
for permission updating and such...

> Personally, I check every query in my PHP code. On the rare occasion
> that it fales, I show an error and get out. Even with postgres, I have
> always checked success or failure of a query and shown an appropriate
> error. Never in two years of programming PHP/postgres have I ever used
> commit/rollback, and I have written some extremely complex web apps
> (sourceforge being a prime example). Geocrawler.com runs on postgres and
> again, I NEVER saw any need for any kind of rollback at all.

Geez. So you never have two inserts or updates you need to perform at once?
*ever*? What happens if your second one fails? Do you manually attempt to
backtrack on the changes you've made?

> The statelessness of the web pretty much obviates the needs for
> locks/rollbacks as each process is extremely quick and runs from start
> to finish instantly. It's not like the old days where you pull data down
> into a local application, work on it, then upload it again.
>
> Only now, with some extremely complex stuff that we're doing on
> SourceForge would I like to see locks and rollbacks (hence my recent
> interest in benchmarking and comparing the two). Your average web
> programmer will almost never run into that in the short term.

This is simply false. If you're not using commit/rollbacks, you're either
cutting back on the functionality of your site, creating potential error
situations by the dozen, or you've got some serious design issues in your
system. Commit/Rollback is not an "advanced" part of building web sites. It
is a basic building block.

Telling your "average web programmer" to ignore transactions is like telling
your programmers not to free memory in your C programs because, hey, who
cares, you've got enough RAM for small programs, and they can learn to clean
up memory when they build "real" systems!

Of all things, this is precisely the type of thinking that crushes the
credibility of the open-source community. Enterprise IT managers understand
in great detail the need for transactions. Web sites actually need *more*
reliable technology, because you don't have that stateful session: you
sometimes need to recreate rollback mechanisms across pages by having
cleanup processes. Building this on a substrate that doesn't support the
basic transaction construct is impossible and irresponsible.

> Your own strong biases are shown in your message. I do this stuff
> because I'm curious and want to find out for myself. Most readers will
> find it interesting as I did. Few will switch from MySQL to postgres or
> vice versa because of it.

My bias? Well, my company doesn't have a vested interest in promoting
Postgres or MySQL. Before I started using Postgres, I looked into MySQL.
You're right if you think my evaluation didn't take too long. If I have
preferences, they're based purely on engineering decisions. That's not the
same as "my company just publicly endorsed MySQL, and check it out, we think
MySQL is better than Postgres."

Note that I am *not* saying that you're doing this on purpose, I'm just
saying that you're going to have a really hard time proving your
objectivity.

> Another clarification: PHPBuilder is owned by internet.com, a competitor
> of VA Linux/Andover.

PHP folks have a bias, too: PHP was built with MySQL in mind, it even ships
with MySQL drivers (and not Postgres). PHP's mediocre connection pooling
limits Postgres performance.

I'm happy to continue this discussion, but here's what I've noticed from
having had this argument many many times: if you don't believe that
transactions are useful or necessary, that subselects and enforced foreign
key constraints are hugely important, then this discussion will lead
nowhere. We simply begin with different assumptions.

I only suggest that you begin your evaluation article by explaining:
- your assumptions
- the fact that the page you used for benchmarking was originally built
for MySQL, and thus makes no use of more advanced Postgres features.

-Ben

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuznet 2000-07-05 15:06:06 Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
Previous Message The Hermit Hacker 2000-07-05 14:41:43 Re: Proposed new libpq API