Re: SQL compatibility reminder: MySQL vs PostgreSQL

From: "Pierre C" <lists(at)peufeu(dot)com>
To: francois(dot)perou(at)free(dot)fr, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, dpage(at)pgadmin(dot)org
Subject: Re: SQL compatibility reminder: MySQL vs PostgreSQL
Date: 2010-03-08 09:17:55
Message-ID: op.u88r35yfeorkce@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers


> My opinion is that PostgreSQL should accept any MySQL syntax and return
> warnings. I believe that we should access even innodb syntax and turn it
> immediately into PostgreSQL tables. This would allow people with no
> interest in SQL to migrate from MySQL to PostgreSQL without any harm.

A solution would be a SQL proxy (a la pgpool) with query rewriting.

> PHP developers don't have time to invest in learning deep SQL.

This is true, and it is a big problem IMHO. It results in lots of slow,
broken, insecure database designs.

ALL the web apps that I've done "CPR ressuscitation" on follow the same
schema :
- devs are database noobs
- generous use of MyISAM
- numerous queries, most of them unoptimized and/or useless
- use of Apache/mod_php instead of fastcgi
- sometimes, use of a huge slow bloated CMS/"framework" which issues even
more unoptimized and/or useless SQL queries
- site gains popularity
- huge traffic takes an unprepared team by surprise (never heard of stuff
like concurrency or scaling)
- site fails horribly

That said, I've got a 150.000+ members forum running on MySQL with sub 5
ms page times on a low-end server, it works if you do it right.

Most opensource PHP apps developers have to expend lots of efforts to work
on MyISAM that doesn't support foreign keys or constraints.
If those resources could be directed to useful work instead of wasted like
this, the result would be a lot better.
The irony is that even with all that effort, you can't make a web app work
without transactions, sooner or later your database integrity will fail.

My theory on this is simple :

- PHP is a very weak language, not suited to implementation of really
useful frameworks (unlike Python / Ruby)
example : Find an ORM for PHP that is as good as sqlalchemy. It does not
exist, because it is impossible to do.
-> really smart programmers dislike PHP because it is a pretty weak
language, so they all flee to Python, Ruby, etc
All big PHP applications turn into a huge "usine à gaz", impossible to
understand code, because of language weakness.
- really smart DBAs dislike MySQL (unless they have a nice paying job at
facebook or flickr)

So, it is very difficult to find good PHP developers, and especially with
database knowledge.

> IMHO, PostgreSQL has to be more flexible (in
> psychological terms) to understand MySQL user needs and answer them,
> just to give them a choice to migrate to PostgreSQL.

Problem is, as you mentioned above, most PHP developers don't know what
their "needs" are because they have little database expertise.

About stuff MySQL does that I would like postgres to implement, I'd focus
more on features, not syntax :

- some form of index-only scans or equivalent (visibility map would
probably suffice)
- some form of INSERT ON DUPLICATE KEY UPDATE or equivalent (merge...)
where the DB, not me, takes care of concurrency
- some way to "SELECT a,b,c,d GROUP BY a" when it can be determined that
it is equivalent to "GROUP BY a,b,c,d", ie a is UNIQUE NOT NULL
- index skip scans (well, MySQL doesn't really do index skip scans, but
since it can do index-only scans, it's an approximation)
- simpler syntax for DELETEs using JOINs

And while I'm at it, I'll add my pet feature :

An extremely fast form of temporary storage.

Table main is referenced by tables child1, child2, ... childN

- SELECT ... FROM main
WHERE (very complex condition involving gist coordinates search etc)
ORDER BY

Then I want the rows from child tables which reference those results.
If I add a lot of JOINs to my query, it's entirely possible that the (very
complex condition involving gist coordinates search etc) is mis-estimated
. This is generally not a problem since it usually uses bitmap index scans
which can survive lots of abuse. However it causes mis-planning of the
JOINs which is a problem.

Besides, some of the child tables have few rows, but lots of columns, so
it complicates the query and returns many times the same data, which the
ORM doesn't care about since it would rather instanciate 1 object per
referenced table row instead of 1 object per main table row.

I would like to do :

CREATE TEMP TABLE foo AS SELECT ... FROM main
WHERE (very complex condition involving gist coordinates search etc);

ANALYZE foo;
SELECT * FROM foo ORDER BY ...
SELECT c.* FROM foo JOIN child1 ON (...)
SELECT c.* FROM foo JOIN child2 ON (...)

etc

This splits the query into much easier to manage fragments, and the
results are easier to use, too.
I can store in the application only 1 object per child table row.
But I can't do this because it causes an update of system catalogs (slow,
iowait, and bloat).

Basically it would be nice to have "something" (temp table, cursor, CTE,
tuplestore, whatever) that can hold a short-lived result set, can be used
like a table, can have accurate statistics, and can be used in several
queries, without disk writes.

Note this would completely solve the set-returning functions stats problem
since you could store and analyze the function result in an efficient way.

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Marko Tiikkaja 2010-03-08 09:38:47 Re: SQL compatibility reminder: MySQL vs PostgreSQL
Previous Message Paragon Corporation 2010-03-08 04:16:50 Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Bunce 2010-03-08 09:36:32 Re: Core dump running PL/Perl installcheck with bleadperl [PATCH]
Previous Message David Christensen 2010-03-08 06:39:36 Re: Explicit psqlrc