Re: Article on MySQL vs. Postgres

From: Tim Perdue <tperdue(at)valinux(dot)com>
To: Benjamin Adida <ben(at)mit(dot)edu>
Cc: pgsql-hackers(at)hub(dot)org
Subject: Re: Article on MySQL vs. Postgres
Date: 2000-07-05 00:30:51
Message-ID: 396281BB.CD6F1E94@valinux.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Benjamin Adida wrote:
> Jan said that each tool has its value, and that's true. I recommend you
> define your evaluation context before you write this. Is this for running a
> serious mission-critical web site? Is it for logging web site hits with
> tolerance for data loss and a need for doing simple reporting?

This is for what most people do with PHP and databases - run
semi-critical medium-traffic sites. Anyone running a mission-critical
site would have to look elsewhere for true robustness. I would not at
this time recommend any serious, life-threatening app run On either
database.

> > Performace/Scalability:
> >
> > MySQL was About 50-60% faster in real-world web serving, but it crumbles
> > under a real load. Postgres on the other hand scaled 3x higher than
> > MySQL before it started to crumble on the same machine. Unfortunately,
> > Postgres would probably still lose on a high-traffic website because
> > MySQL can crank out the pages so much faster, number of concurrent
> > connections is hard to compare. MySQL also seems to make better use of
> > multiple-processor machines like the quad-xeon I tested on. Postgres
> > never saturated all 4 processors as MySQL did.
>
> What kind of queries did you perform?

I took a real-world page from our site
<http://sourceforge.net/forum/forum.php?forum_id=1> and made it portable
to both databases. Of course, I could not import the "body" of the
message into postgres because of the 8k limitation, so the body had to
be dropped from both databases.

The "nested" view of this page requires joins against three tables and
some recursion to show submessages.

The test was conducted with "ab" (apache benchmark software) using
varying numbers of concurrent connections and 1000 total page views.

The "10% inserts" test is most realistic, as about 10% of all page views
in a discussion forum involve posting to the database. I used a
random-number generator in the PHP script to insert a row into the table
10% of the time. If you look at the results, you'll see that MySQL was
actually harmed somewhat more by the writes than postgres was.

Here are the actual results I saw on my quad-xeon machine:

postgres:

concurrency w/pconnects:
10 cli - 10.27 pg/sec 333.69 kb/s
20 cli - 10.24 pg/sec 332.86 kb/s
30 cli - 10.25 pg/sec 333.01 kb/s
40 cli - 10.0 pg/sec 324.78 kb/s
50 cli - 10.0 pg/sec 324.84 kb/s
75 cli - 9.58 pg/sec 311.43 kb/s
90 cli - 9.48 pg/sec 307.95 kb/s
100 cli - 9.23 pg/sec 300.00 kb/s
110 cli - 9.09 pg/sec 295.20 kb/s
120 cli - 9.28 pg/sec 295.02 kb/s (2.2% failure)

concurrency w/10% inserts & pconnects:
30 cli - 9.97 pg/sec 324.11 kb/s
40 cli - 10.08 pg/sec 327.40 kb/s
75 cli - 9.51 pg/sec 309.13 kb/s

MySQL:

Concurrency Tests w/pconnects:
30 cli - 16.03 pg/sec 521.01 kb/s
40 cli - 15.64 pg/sec 507.18 kb/s *failures
50 cli - 15.43 pg/sec 497.88 kb/s *failures
75 cli - 14.70 pg/sec 468.64 kb/s *failures
90 - mysql dies
110 - mysql dies
120 - mysql dies

Concurrency Tests w/o pconnects:
10 cli - 16.55 pg/sec 537.63 kb/s
20 cli - 15.99 pg/sec 519/51 kb/s
30 cli - 15.55 pg/sec 505.19 kb/s
40 cli - 15.46 pg/sec 490.01 kb/s 4.7% failure
50 cli - 15.59 pg/sec 482.24 kb/s 8.2% failure
75 cli - 17.65 pg/sec 452.08 kb/s 36.3% failure
90 cli - mysql dies

concurrency w/10% inserts & pconnects:
20 cli - 16.37 pg/sec 531.79 kb/s
30 cli - 16.15 pg/sec 524.64 kb/s
40 cli - 22.04 pg/sec 453.82 kb/sec 37.8% failure

> Did you use connection pooling (a lot

I used persistent connections, yes. Without them, Postgres' showing was
far poorer, with mysql showing about 2x the performance.

> of PHP apps don't, from what I've seen)? How does the performance get
> affected when a query in Postgres with subselects has to be split into 4
> different queries in MySQL?

I'd really love to see a case where a real-world page view requires 4x
the queries on MySQL. If you are doing subselects like that on a website
in real-time you've got serious design problems and postgres would
fold-up and quit under the load anyway.

> Postgres is process-based, each connection
> resulting in one process. If you use connection pooling with at least as
> many connections as you have processors, you should see it scale quite well.
> In fact, for serious load-testing, you should have 10-15 pooled connections.
>
> I *strongly* question your intuition on Postgres running web sites. MySQL's

Specifically, what is the problem with my "intuition"? All I did in the
prior message was report my results and ask for feedback before I post
it.

> write performance is very poor, which forces excessive caching (see sites
> like Slashdot) to prevent updates from blocking entire web site serving.
> Yes, the BDB addition might be useful. Let's see some performance tests
> using BDB tables.

I wouldn't use BDB tables as MySQL 3.23.x isn't stable and I wouldn't
use it until it is.

> > Postgres is undoubtably the long-run winner in stability, whereas MySQL
> > will freak out or die when left running for more than a month at a time.
> > But if you ever do have a problem with postgres, you generally have to
> > nuke the database and recover from a backup, as there are no known tools
> > to fix index and database corruption. For a long-running postgres
> > database, you will occasionally have to drop indexes and re-create them,
> > causing downtime.
>
> Dropping indexes and recreating them does not cause downtime. I've run a
> couple of postgres-backed web sites for months on end with no issues. I've
> survived a heavy slashdotting on my dual Pentium II-400, with Postgres
> WRITES and READS on every Slashdot-referred hit, resulting in perfectly
> respectable serving times (less than 3-4 seconds to serve > 20K of data on
> each hit). No caching optimization of any kind on the app layer. And I'd
> forgotten to vacuum my database for a few days.

Not sure why you're arguing with this as this was a clear win for
postgres.

> Do you run on a magic power grid that
> never fails?

Reality is that postgres is as likely - or more likely - to wind up with
corrupted data than MySQL. I'm talking physical corruption where I have
to destroy the database and recover from a dump. Just a couple months
ago I sent a message about "Eternal Vacuuming", in which case I had to
destroy and recover a multi-gigabyte database.

Further, I have had situations where postgres actually had DUPLICATE ids
in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That? Well,
I had to run a count(*) next to each ID and select out the rows where
there was more than one of each "unique" id, then reinsert those rows
and drop and rebuild the indexes and reset the sequences.

I've only been using MySQL for about a year (as compared to 2 years for
postgres), but I have never seen either of those problems with MySQL.

> Do you never have code-related error conditions that require
> rolling back a series of database edits?

Personally, I check every query in my PHP code. On the rare occasion
that it fales, I show an error and get out. Even with postgres, I have
always checked success or failure of a query and shown an appropriate
error. Never in two years of programming PHP/postgres have I ever used
commit/rollback, and I have written some extremely complex web apps
(sourceforge being a prime example). Geocrawler.com runs on postgres and
again, I NEVER saw any need for any kind of rollback at all.

The statelessness of the web pretty much obviates the needs for
locks/rollbacks as each process is extremely quick and runs from start
to finish instantly. It's not like the old days where you pull data down
into a local application, work on it, then upload it again.

Only now, with some extremely complex stuff that we're doing on
SourceForge would I like to see locks and rollbacks (hence my recent
interest in benchmarking and comparing the two). Your average web
programmer will almost never run into that in the short term.


> One quick point: while you may well be personally unbiased, VA Linux just
> endorsed and funded MySQL. SourceForge uses MySQL. How do you expect to
> convince readers that you're being objective in this comparison?

Your own strong biases are shown in your message. I do this stuff
because I'm curious and want to find out for myself. Most readers will
find it interesting as I did. Few will switch from MySQL to postgres or
vice versa because of it.

Another clarification: PHPBuilder is owned by internet.com, a competitor
of VA Linux/Andover.

Tim

--
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Bitmead 2000-07-05 00:33:14 Re: Re: [HACKERS] Revised Copyright: is this more palatable?
Previous Message Tom Lane 2000-07-05 00:25:53 Re: Memory-context slinging