Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date: 2004-09-13 02:29:00
Message-ID: m3acvulx5f.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The world rejoiced as Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com> wrote:
> Mark Cotner wrote:
>> Requirements:
>> Merge table definition equivalent. We use these
>> extensively.

> Looked all over mysql.com etc, and afaics merge table is indeed
> exactly a view of a union-all. Is that right?

> PG supports views, of course, as well (now) as tablespaces, allowing
> you to split tables/tablesets across multiple disk systems. PG is
> also pretty efficient in query plans on such views, where (say) you
> make one column a constant (identifier, sort of) per input table.

The thing that _doesn't_ work well with these sorts of UNION views are
when you do self-joins. Supposing you have 10 members, a self-join
leads to a 100-way join, which is not particularly pretty.

I'm quite curious as to how MySQL(tm) copes with this, although it may
not be able to take place; they may not support that...

>> Um, gonna sound silly, but the web interface has to remain "snappy"
>> under load. I don't see this as a major concern since you don't
>> require table locking.

> Agreed. It's more in your warehouse design, and intelligent bounding
> of queries. I'd say PG's query analyzer is a few years ahead of
> MySQL for large and complex queries.

The challenge comes in if the application has had enormous amounts of
effort put into it to attune it exactly to MySQL(tm)'s feature set.

The guys working on RT/3 have found this a challenge; they had rather
a lot of dependancies on its case-insensitive string comparisons,
causing considerable grief.

> On the other hand, if you do warehouse-style loading (Insert, or PG
> COPY, into a temp table; and then 'upsert' into the perm table), I
> can guarantee 2500 inserts/sec is no problem.

The big wins are thus:

1. Group plenty of INSERTs into a single transaction.

2. Better still, use COPY to cut parsing costs plenty more.

3. Adding indexes _after_ the COPY are a further win.

Another possibility is to do clever things with stored procs; load
incoming data using the above optimizations, and then run stored
procedures to use some more or less fancy logic to put the data where
it's ultimately supposed to be. Having the logic running inside the
engine is the big optimization.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','linuxfinances.info').
http://linuxfinances.info/info/spreadsheets.html
Rules of the Evil Overlord #198. "I will remember that any
vulnerabilities I have are to be revealed strictly on a need-to-know
basis. I will also remember that no one needs to know."
<http://www.eviloverlord.com/>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message George Essig 2004-09-13 03:03:57 Re: TSearch2 and optimisation ...
Previous Message bill 2004-09-13 01:35:06 tblspaces integrated in new postgresql (version 8.0)