Drop indexes inside transaction?

From: Steve Lane <slane(at)moyergroup(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Drop indexes inside transaction?
Date: 2004-02-06 03:48:35
Message-ID: BC486AB3.9862%slane@moyergroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello all:

Question here about how best to optimize a large data load. Data load is ad
hoc and so needs to be able to run during production hours.

We have a piece of software written in a desktop RAD environment (FileMaker
Pro). Our users are using this tool to generate data that we need to get
into postgresql. We're sending it through a middleware layer written in PHP.

A single data load will generate from 10K-100K rows.

Initially we did everything procedurally through PHP. 100K inserts, each one
called through several layers of PHP abstraction. Bad idea.

Current idea is to have PHP dump the data to a file, and suck the whole file
in at once somehow. So far, so good: PHP can create the file in 6 minutes
for 100K rows. That's actually acceptable.

Now we want to use COPY to bring the data in. The target table has 6
indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
they're all similar, single-column indexes. Anyway, that's another story).

Normally, in a batch-like environment, I'd feel free to drop the indexes,
load, reindex. That's perfectly fast. But the environment needs to be live,
and those indexes are vital to a reporting engine that can be hit at any
time. So we can't just drop them, even briefly.

So I hit on the idea of doing the same thing, but inside a transaction. In
theory that should affect no one else. To my delight, the transaction
drop-copy-reindex ran in 7 seconds.

I guess I'm just wondering how that's possible. I hate to sound like a
superstitious goof, but it sounds to good to be true. At best, I figured to
pay the whole penalty at the time of COMMIT -- that it would be fast up to
that point, and then of course need to do exactly the same work as the
transactionless version, as far as reconciling indexes or whatever the more
accurately technical term is.

So: is this too good to be true? Or is this actually a good way to do this?

Any other observations on the whole process? Is there a better or different
approach, or other things we should consider?

Any and all thoughts are welcome.

-- sgl

=======================================================
Steve Lane

Vice President
The Moyer Group
14 North Peoria St Suite 2H
Chicago, IL 60607

Voice: (312) 433-2421 Email: slane(at)moyergroup(dot)com
Fax: (312) 850-3930 Web: http://www.moyergroup.com
=======================================================

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Steve Lane 2004-02-06 04:36:52 Index (re)-creation speed
Previous Message Edoardo 2004-02-06 00:12:56 Re: R: R: slow seqscan after vacuum analize