Re: large dataset with write vs read clients

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: large dataset with write vs read clients
Date: 2010-10-10 06:43:12
Message-ID: 4CB16080.1050406@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/10/2010 5:35 AM, Mladen Gogala wrote:
> I have a logical problem with asynchronous commit. The "commit" command
> should instruct the database to make the outcome of the transaction
> permanent. The application should wait to see whether the commit was
> successful or not. Asynchronous behavior in the commit statement breaks
> the ACID rules and should not be used in a RDBMS system. If you don't
> need ACID, you may not need RDBMS at all. You may try with MongoDB.
> MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs

That argument makes little sense to me.

Because you can afford a clearly defined and bounded loosening of the
durability guarantee provided by the database, such that you know and
accept the possible loss of x seconds of work if your OS crashes or your
UPS fails, this means you don't really need durability guarantees at all
- let alone all that atomic commit silliness, transaction isolation, or
the guarantee of a consistent on-disk state?

Some of the other flavours of non-SQL databases, both those that've been
around forever (PICK/UniVerse/etc, Berkeley DB, Cache, etc) and those
that're new and fashionable Cassandra, CouchDB, etc, provide some ACID
properties anyway. If you don't need/want an SQL interface to your
database you don't have to throw out all that other database-y goodness
if you haven't been drinking too much of the NoSQL kool-aid.

There *are* situations in which it's necessary to switch to relying on
distributed, eventually-consistent databases with non-traditional
approaches to data management. It's awfully nice not to have to, though,
and can force you to do a lot more wheel reinvention when it comes to
querying, analysing and reporting on your data.

FWIW, a common approach in this sort of situation has historically been
- accepting that RDBMSs aren't great at continuous fast loading of
individual records - to log the records in batches to a flat file,
Berkeley DB, etc as a staging point. You periodically rotate that file
out and bulk-load its contents into the RDBMS for analysis and
reporting. This doesn't have to be every hour - every minute is usually
pretty reasonable, and still gives your database a much easier time
without forcing you to modify your app to batch inserts into
transactions or anything like that.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2010-10-10 06:55:39 Re: large dataset with write vs read clients
Previous Message Samuel Gendler 2010-10-10 03:07:00 Re: Slow count(*) again...