Re: performance of insert/delete/update

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Tim Gardner <tgardner(at)codeHorse(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance of insert/delete/update
Date: 2002-11-26 00:23:57
Message-ID: Pine.LNX.4.33.0211251652200.8723-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Mon, 25 Nov 2002, Tim Gardner wrote:

> >The funny thing it, they've often avoided transactions because they
> >figured they'd be slower than just inserting the rows, and you kinda have
> >to make them sit down first before you show them the performance increase
> >from putting all those inserts into a single transaction.
> >
> >No offense meant, really. It's just that you seemed to really doubt that
> >putting things into one transaction helped, and putting things into one
> >big transaction if like the very first postgresql lesson a lot of
> >newcomers learn. :-)
>
> Scott,
>
> I'm new to postgresql, and as you suggested, this is
> counter-intuitive to me. I would have thought that having to store
> all the inserts to be able to roll them back would take longer. Is
> my thinking wrong or not relevant? Why is this not the case?

Your thinking on this is wrong, and it is counter-intuitive to think that
a transaction would speed things up. Postgresql is very different from
other databases.

Postgresql was designed from day one as a transactional database. Which
is why it was so bothersome that an Oracle marketroid recently was telling
the .org folks why they shouldn't use Postgresql because it didn't have
transactions. Postgresql may have a few warts here and there, but not
supporting transactions has NEVER been a problem for it.

There are two factors that make Postgresql so weird in regards to
transactions. One it that everything happens in a transaction (we won't
mention truncate for a while, it's the only exception I know of.)

The next factor that makes for fast inserts of large amounts of data in a
transaction is MVCC. With Oracle and many other databases, transactions
are written into a seperate log file, and when you commit, they are
inserted into the database as one big group. This means you write your
data twice, once into the transaction log, and once into the database.

With Postgresql's implementation of MVCC, all your data are inserted in
real time, with a transaction date that makes the other clients ignore
them (mostly, other read committed transactions may or may not see them.)

If there are indexes to update, they are updated in the same "invisible
until committed" way.

All this means that your inserts don't block anyone else's reads as well.

This means that when you commit, all postgresql does is make them visible.

In the event you roll back a transaction, the tuples are all just marked
as dead and they get ignored.

It's interesting when you work with folks who came from other databases.
My coworker, who's been using Postgresql for about 2 years now, had an
interesting experience when he first started here. He was inserting
something like 10,000 rows. He comes over and tells me there must be
something wrong with the database, as his inserts have been running for 10
minutes, and he's not even halfway through. So I had him stop the
inserts, clean out the rows (it was a new table for a new project) and
wrap all 10,000 inserts into a transaction. What had been running for 10
minutes now ran in about 30 seconds.

He was floored.

Well, good luck on using postgresql, and definitely keep in touch with the
performance and general mailing lists. They're a wealth of useful info.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message scott.marlowe 2002-11-26 00:30:00 Re: performance of insert/delete/update
Previous Message Rod Taylor 2002-11-26 00:20:03 Re: performance of insert/delete/update

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2002-11-26 00:30:00 Re: performance of insert/delete/update
Previous Message Rod Taylor 2002-11-26 00:20:03 Re: performance of insert/delete/update