Re: performance of insert/delete/update

From: "Rich Scott" <rich_scott(at)fastmail(dot)fm>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: "Wei Weng" <wweng(at)kencast(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance of insert/delete/update
Date: 2002-11-26 00:43:39
Message-ID: 20021126004339.46BB12FD1F@server3.fastmail.fm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


I have seen a number of real-world situations where bundling inserts into
transactions made a considerable difference - sometimes as much as a 100x
speed-up, and not just in Postgresql databases, but also commercial
systems
(my experience is in Oracle & Sybase). I've often used an idiom of
building
up rows until I hit some high-water mark, and then insert those rows in
one
fell swoop - it's almost always measurably faster than one-at-a-time.

Sidebar: a number of years ago, while contracting at a regional telephone
company,
I became peripherally enmired in a gigantic billing-system-makeover
fiasco.
Upon initial deployment, the system was so slow at processing that it was
taking about 30 hours for each day of billing data. After a week or so,
when it became apparent that fundamental Cash Flow was threatened, there
were
multi-hour conference calls, in which various VPs called for massive h/w
upgrades and/or lawsuits against Oracle. An astute cohort of mine asked
to
see some of the code, and found out that the original developers (at the
telco)
had created a bloated and slow control system in C++, using semaphores or
somesuch,
to *serialize* inserts/updates/deletes, and so they had gigantic
home-built
queues of insert jobs. Not only were they not bundling updates in
transactions,
they were only ever doing one transaction at a time. (Apparently, they
never
learned RDBMS fundamentals.) He told them to rip out all that code, and
let
Oracle (like any other decent RDBMS) handle the update ordering. The
resultant
speed-up factor was several hundred times.

-R

On Mon, 25 Nov 2002 15:59:16 -0700 (MST), "scott.marlowe"
<scott(dot)marlowe(at)ihs(dot)com> said:
> On Mon, 25 Nov 2002, Josh Berkus wrote:
>
> > Scott,
> >
> > > It's quite easy to test if you have a database with a large table to play
> > > with, use pg_dump to dump a table with the -d switch (makes the dump use
> > > insert statements.) Then, make two versions of the dump, one which has a
> > > begin;end; pair around all the inserts and one that doesn't, then use psql
> > > -e to restore both dumps. The difference is HUGE. Around 10 to 20 times
> > > faster with the begin end pairs.
> > >
> > > I'd think that anyone who's used postgresql for more than a few months
> > > could corroborate my experience.
> >
> > Ouch!
> >
> > No need to get testy about it.
> >
> > Your test works as you said; the way I tried testing it before was different.
> > Good to know. However, this approach is only useful if you are doing
> > rapidfire updates or inserts coming off a single connection. But then it is
> > *very* useful.
>
> I didn't mean that in a testy way, it's just that after you've sat
> through
> a fifteen minute wait while a 1000 records are inserted, you pretty
> quickly switch to the method of inserting them all in one big
> transaction. That's all.
>
> Note that the opposite is what really gets people in trouble. I've seen
> folks inserting rather large amounts of data, say into ten or 15 tables,
> and their web servers were crawling under parallel load. Then, they put
> them into a single transaction and they just flew.
>
> The funny thing it, they've often avoided transactions because they
> figured they'd be slower than just inserting the rows, and you kinda have
> to make them sit down first before you show them the performance increase
> from putting all those inserts into a single transaction.
>
> No offense meant, really. It's just that you seemed to really doubt that
> putting things into one transaction helped, and putting things into one
> big transaction if like the very first postgresql lesson a lot of
> newcomers learn. :-)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Johnson 2002-11-26 01:41:03 Re: performance of insert/delete/update
Previous Message scott.marlowe 2002-11-26 00:41:09 Re: performance of insert/delete/update