Re: PGSQL or other DB?

From: Erik Jones <ejones(at)engineyard(dot)com>
To: durumdara <durumdara(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PGSQL or other DB?
Date: 2009-01-31 09:13:16
Message-ID: C1AD69EA-D5AB-46B7-B17E-974395A796A5@engineyard.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Jan 30, 2009, at 11:37 AM, durumdara wrote:

> The main viewpoints:
> - quick (re)connect - because mod_python basically not store the
> database connections persistently

mod_python is not a database adaptor. Put another way, mod_python
does not make database connections. If you're using Python then to
connect to Postgres you'll most likely be using psycopg2 to connect to
and query your database. I'm not sure what your previous database
experience is but it sounds like you may be thinking in terms of PHP's
persistent connections and on that I'll just say that PHP's persistent
connections are not true persistent connections from your top-level
application's perspective. If you want connection pooling then there
are options available, pgpool-II and pgbouncer are probably the two
most commonly used ones today.

> - fast queries

Stop and think about that one for a moment. Do you really expect to
go to a specific database's mailing list(s) with that requirement
expecting anything other than "We got you covered!"

> - easy IDE to use (like pgadmin)

I think you just answered your own question.

> - the db server can handle many databases (webdb_nnn where nnn is
> an integer)

I know people hate hearing this answer but that's the wrong question.
What matters isn't how many databases can be run (and, if you ever run
into a database server in this day and age where it does, run) but
rather how many concurrent connections you expect as well as both the
total data set size and how much of that data set will be constantly
worked with, i.e. needs to be in memory. Let's say you have 10 5G
databases with each averaging 10% of their total size in use (in
memory, hopefully) at any given moment. That's entirely different
from 10 10G databases with each averaging 10% in memory and both are
different from running 200 databases each with 5% in use at any given
moment. All of that being said, a properly configured Postgres
provisioned for the workload installation built with a little
knowledge will scale with the best of 'em.

> - I can add/modify a table, or a field to a table without "full
> lock" on the table (like DBISAM restructure). Like in FireBird,
> where the "add field" change only the table description. I don't
> know that PG supports this way of the DB modifying.

Nope. PostgreSQL is an all or nothing transactional database. I'd
never heard of DBISAM before you mentioned it and have never used
Firebird. After doing a little reading it turns out that if you plan
to use transactions at all (which is very likely given even just the
little you've described about the applications you're building) then
you should realize that altering tables is not compatible with
transactions and doing so will automatically commit any open
transactions on the table. Looking into Firebird I couldn't find how
it handles (or doesn't) that at all I but I did see that it will
happily let you add a new not null column with no default to a table
by writing nulls for the new attribute for any existing columns. That
already makes me queasy.

> - Quick and easy backup/restore system

Well, again, that depends. I believe pgAdmin has some nice menu
options to handle simple cases (others can either add to or correct me
there) but as to the core Postgres distribution there's still a bit of
ambiguity arising from cases of "there's more than one way to do it"
even for basic tasks as well as some inconsistent UI aspects to the
existing tools (see the pg_dump/pg_restore/psql comments in the recent
Pet Peeves thread).

> Another important thing that I don't understand (what as I saw) that
> the PostGreSQL is store the databases in one, unseparatable file
> set, in a directory named data.
> In another databases, like DBISAM, FireBird, MySQL, the databases
> are separated to another directories/files.

> This "one datadir" is seems to be not too good for us. We used
> DBISAM in our clients, and many times when we got some filesystem
> error, we can simply recover the tables - from the files.

Way off. While I didn't look into the storage layouts for DBISAM or
Firebird I can tell you that in any MySQL server instance all of the
tables will be under (to some depth) the same top level data directory
directory with each database in each own directory under that and
table files for each database under those. I think you may be able to
use symlinks MyISAM table files but InnoDB tables will croak if you
try to do that. What's more, by default InnoDB does *not* use
separate files per table and instead stores all InnoDB tables in one
file with it's own internal file system. Admittedly, I think most
people turn on the innodb_file_per_table option such that each table
will get it's own data and index files but even then all of a given
table's data indexes will reside in the same file. Each of those
InnoDB table data+index files is referred to as an InnoDB tablespace
(but that's really different from what most people expect a tablespace
to be, see next paragraph). One consequence of that is that adding an
index to a table requires that the entire table and all of it's
indexes be rewritten.

PostgreSQL also follows the 1 directory per-database setup. Each
database directory is located directly underneath the base/ direcotry
in the main Postgres One big difference is that table data and index
data are in separate files and multiple indexes for a given table are
all in separate files. This can be viewed as the low-level embodiment
of the fact that in PostgreSQL indexes are not part of the tables they
are for but rather separate, but tightly coupled, database objects.
With this scheme adding a new index, which doesn't even require the
table to be locked any more for new indexes, only needs to write the
new index file, not rewrite the table data and previously existing
indexes as well. What's more, Postgres allows you to create real
tablespaces so that you can place individual persistent database
objects (databases, tables, indexes, and some constraints) on separate
storage.

>
> When we want to backup or restore one database, we can do it in two
> way: a.) archive all files b.) make sql dump from database.

Both options are available with Postgres although, as has been noted
by others, option 'a' requires other tools in order to back up a
consistent data set.

> If a file get corrupted in a database, then we can restore the datas
> from files, and this filesystem error causes problems only for this
> database, not for all.

> I very fear from to keep all databases in one place, because if they
> are corrupted, possible more of them injured (if they are not
> separated).
> I cannot make filesystem based (hard) copy from one db (only SQL
> dump enabled).

Again, I'm not sure what you mean. There are different kinds of
corruption that can be found ion files. Database files could be
considered corrupt if the database screws something up and writes bad
data to a file. That would be localized to the files it wrote to in
the bad operation (and is extremely rare with Postgres) whereas what I
think of as filesystem corruption often doesn't see file boundaries.

> Ok, I saw that pgsql supports "tablespaces", but as I saw, this
> function can hold only table datas in the another directory, and not
> the full database can separated with them.

No, you can put entire databases in/on different tablespaces.

> Because I don't used one PGSQL with many databases (up to 200), I
> don't know, what happening, and which cases possible. But I think
> you have many experience with it. Please share it with me!
>
> Please help me, because we need to determine which DB to use.
>
> I started the usage of the PG in prev. month, and I liked it except
> the way of the data storage (one data dir).
>
> I tried the MySQL before I tried PG. InnoDB is seems to be "forcing
> transaction system on MyISAM". And boolean data type is missing
> (solved with enum?).
> I don't like it all, but it is seems to be fast with little tables,
> and it is separate the database files to another directories which
> thing I like. Possible it have many limitations what I don't saw in
> first time.

Well, if you don't want transaction then stay away from Postgres and
realize that Postgres actually gives you more built-in support for
placing your data at different locations than MySQL.

> Please help me, which DB is good for us, and how to configure, and
> use PGSQL with these database-set which we need to use.

Just realize that when choosing a complex technology solution like a
database there really is no "yes" or "no" answer or "here's the right
way to do it" and "here's the wrong way". You need to define your
requirements and part of that is asking questions. For example, 'how
to configure..' -- if you seriously ask a question like that here
you're going to get more questions (what kind of IO subsystem? how
many concurrent connections do you need to support? how much memory do
you have? how large is our data set? etc...) than you will answers.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Stark 2009-01-31 11:13:13 Re: Pet Peeves?
Previous Message Octavio Alvarez 2009-01-31 04:53:26 Re: Pet Peeves?