Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

From: Mark Cotner <mcotner(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date: 2004-09-13 07:57:52
Message-ID: 20040913075752.22635.qmail@web41501.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

See comments . . . thanks for the feedback.

'njoy,
Mark

--- Christopher Browne <cbbrowne(at)acm(dot)org> wrote:

> The world rejoiced as Mischa Sandberg
> <ischamay(dot)andbergsay(at)activestateway(dot)com> wrote:
> > Mark Cotner wrote:
> >> Requirements:
> >> Merge table definition equivalent. We use these
> >> extensively.
>
> > Looked all over mysql.com etc, and afaics merge
> table is indeed
> > exactly a view of a union-all. Is that right?
>
> > PG supports views, of course, as well (now) as
> tablespaces, allowing
> > you to split tables/tablesets across multiple disk
> systems. PG is
> > also pretty efficient in query plans on such
> views, where (say) you
> > make one column a constant (identifier, sort of)
> per input table.
>
> The thing that _doesn't_ work well with these sorts
> of UNION views are
> when you do self-joins. Supposing you have 10
> members, a self-join
> leads to a 100-way join, which is not particularly
> pretty.
>
> I'm quite curious as to how MySQL(tm) copes with
> this, although it may
> not be able to take place; they may not support
> that...
>
> >> Um, gonna sound silly, but the web interface has
> to remain "snappy"
> >> under load. I don't see this as a major concern
> since you don't
> >> require table locking.
>
> > Agreed. It's more in your warehouse design, and
> intelligent bounding
> > of queries. I'd say PG's query analyzer is a few
> years ahead of
> > MySQL for large and complex queries.
>
> The challenge comes in if the application has had
> enormous amounts of
> effort put into it to attune it exactly to
> MySQL(tm)'s feature set.
>
> The guys working on RT/3 have found this a
> challenge; they had rather
> a lot of dependancies on its case-insensitive string
> comparisons,
> causing considerable grief.
>

Not so much, I've tried to be as agnostic as possible.
Much of the more advanced mining that I've written is
kinda MySQL specific, but needs to be rewritten as
stored procedures anyway.

> > On the other hand, if you do warehouse-style
> loading (Insert, or PG
> > COPY, into a temp table; and then 'upsert' into
> the perm table), I
> > can guarantee 2500 inserts/sec is no problem.
>
> The big wins are thus:
>
> 1. Group plenty of INSERTs into a single
> transaction.
>
> 2. Better still, use COPY to cut parsing costs
> plenty more.
>
> 3. Adding indexes _after_ the COPY are a further
> win.
>
> Another possibility is to do clever things with
> stored procs; load
> incoming data using the above optimizations, and
> then run stored
> procedures to use some more or less fancy logic to
> put the data where
> it's ultimately supposed to be. Having the logic
> running inside the
> engine is the big optimization.

Agreed, I did some preliminary testing today and am
very impressed. I wasn't used to running analyze
after a data load, but once I did that everything was
snappy.

My best results from MySQL bulk inserts was around 36k
rows per second on a fairly wide table. Today I got
42k using the COPY command, but with the analyze post
insert the results were similar. These are excellent
numbers. It basically means we could have our
cake(great features) and eat it too(performance that's
good enough to run the app).

Queries from my test views were equally pleasing. I
won't bore you with the details just yet, but
PostgreSQL is doing great. Not that you all are
surprised. ;)

> --
> wm(X,Y):-write(X),write('@'),write(Y).
> wm('cbbrowne','linuxfinances.info').
> http://linuxfinances.info/info/spreadsheets.html
> Rules of the Evil Overlord #198. "I will
> remember that any
> vulnerabilities I have are to be revealed strictly
> on a need-to-know
> basis. I will also remember that no one needs to
> know."
> <http://www.eviloverlord.com/>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2004-09-13 09:21:42 Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Previous Message mudfoot 2004-09-13 06:11:06 Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options