Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date: 2004-09-13 09:21:42
Message-ID: m3brgajzh5.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

A long time ago, in a galaxy far, far away, mcotner(at)yahoo(dot)com (Mark Cotner) wrote:
> Agreed, I did some preliminary testing today and am very impressed.
> I wasn't used to running analyze after a data load, but once I did
> that everything was snappy.

Something worth observing is that this is true for _any_ of the
database systems supporting a "cost-based" optimization system,
including Oracle and DB2.

When working with SAP R/3 Payroll, on one project, we found that when
the system was empty of data, the first few employee creates were
quick enough, but it almost immediately got excruciatingly slow. One
of the DBAs told the Oracle instance underneath to collect statistics
on the main table, and things _immediately_ got snappy again. But it
didn't get snappy until the conversion folk had run the conversion
process for several minutes, to the point to which it would get
painfully slow :-(. There, with MILLIONS of dollars worth of license
fees being paid, across the various vendors, it still took a fair bit
of manual fiddling.

MySQL(tm) is just starting to get into cost-based optimization; in
that area, they're moving from where the "big DBs" were about 10 years
ago. It was either version 7 or 8 where Oracle started moving to
cost-based optimization, and (as with the anecdote above) it took a
release or two for people to get accustomed to the need to 'feed' the
optimizer with statistics. This is a "growing pain" that bites users
with any database where this optimization gets introduced. It's
worthwhile, but is certainly not costless.

I expect some forseeable surprises will be forthcoming for MySQL AB's
customers in this regard...

> My best results from MySQL bulk inserts was around 36k rows per
> second on a fairly wide table. Today I got 42k using the COPY
> command, but with the analyze post insert the results were similar.
> These are excellent numbers. It basically means we could have our
> cake(great features) and eat it too(performance that's good enough
> to run the app).

In the end, performance for inserts is always fundamentally based on
how much disk I/O there is, and so it should come as no shock that
when roughly the same amount of data is getting laid down on disk,
performance won't differ much on these sorts of essentials.

There are a few places where there's some need for cleverness; if you
see particular queries running unusually slowly, it's worth doing an
EXPLAIN or EXPLAIN ANALYZE on them, to see how the query plans are
being generated. There's some collected wisdom out here on how to
encourage the right plans.

There are also unexpected results that are OK. We did a system
upgrade a few days ago that led to one of the tables starting out
totally empty. A summary report that looks at that table wound up
with a pretty wacky looking query plan (compared to what's usual)
because the postmaster knew that the query would be reading in
essentially the entire table. You'd normally expect an index scan,
looking for data for particular dates. In this case, it did a "scan
the whole table; filter out a few irrelevant entries" plan.

It looked wacky, compared to what's usual, but it ran in about 2
seconds, which was way FASTER than what's usual. So the plan was
exactly the right one.

Telling the difference between the right plan and a poor one is a bit
of an art; we quite regularly take a look at query plans on this list
to figure out what might be not quite right. If you find slow ones,
make sure you have run ANALYZE on the tables recently, to be sure that
the plans are sane, and you may want to consider posting some of them
to see if others can point to improvements that can be made.
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://linuxfinances.info/info/linuxdistributions.html
"I can't believe my room doesn't have Ethernet! Why wasn't it wired
when the house was built?"
"The house was built in 1576."
-- Alex Kamilewicz on the Oxford breed of `conference American.'

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Damien Dougan 2004-09-13 11:38:05 Help with extracting large volumes of records across related tables
Previous Message Mark Cotner 2004-09-13 07:57:52 Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables