Re: performance problem

From: "Rick Gigger" <rick(at)alpinenetworking(dot)com>
To: "Mike Mascari" <mascarm(at)mascari(dot)com>
Cc: "PgSQL General ML" <pgsql-general(at)postgresql(dot)org>
Subject: Re: performance problem
Date: 2003-11-20 22:20:53
Message-ID: 01d701c3afb4$90b68280$0700a8c0@trogdor
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

> If you examine the behavior of pg_dump output, you'll notice that it
> doesn't built indexes until after the COPY command has completed the
> data import. It's a waste of cpu cycles and disk bandwidth to update
> indexes on upon every insert. Your script should:
>
> 1) Drop all indexes on the relation
> 2) Use COPY if possible instead of INSERT to import the data
> 3) Recreate the indexes
> 4) Use UPDATE to update as necessary

I never thought of dropping the indexes in the middle of the script and
readding them. I guess that would be fine as long as nothing else was
running that needed the index while it was gone. I don't think 7.2.x
supports using COPY unless you are inserting all of the fields which I don't
want to do although I suppose I could just insert all of the defaults. Is
that correct?

> > I am using 7.2.4. Has this improved in later versions? I'm not
concerened
> > since this is a very rare thing to need to do and it's obviously
possible to
> > work around but it would be nice if postgres could figure things like
that
> > out on it's own. (It certainly would have saved me a lot of time and
> > confusion last night at about 3 am). Is there a way to for the use of a
> > specific index on a query?
>
> You can force the use of an index scan by turning sequential scans to off:
>
SET ENABLE_SEQSCAN TO OFF;

Thanks.

> But the problem is that the statistics didn't match the data. You
> could have:
>
> 1) Used INSERTs to insert the data into a relation with an index
> 2) Executed ANALYZE <foo> to update the statistics
> 3) Perform the UPDATE

That would have been the easiest thing. I wanted to do that but I thought
that you had to do a vacuum (which I couldn't do in the transaction)
together with analyze. I didn't realize that analyze was a command all by
itself.

> After the UPDATE, you'll still have dead tuples (the original rows)
> which require that they be marked as dead, and so you should
> occassionally run either VACUUM to mark them as such or VACUUM FULL to
> reclaim the dead space or VACUUM FULL ANALYZE to also update the
> relation's statistics.

I do them all nightly with cron on all my important databases. I just had a
problem with need the analyze to happen in the middle of the transaction in
this one special case.

> I would just execute the steps I outlined above with COPY and not
> worry about ANALYZEs and VACUUMs in a script. People often run VACUUM
> ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron
> job. Then, of course, there's REINDEX...

Doing the vacuum full analyze doesn't take long on to do if I do it once a
day so I just do that.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rick Gigger 2003-11-20 22:21:55 Re: performance problem
Previous Message Siddharth Rath (sirath) 2003-11-20 22:01:13 Re: Need your help (frustrating issue)

Browse pgsql-general by date

  From Date Subject
Next Message Rick Gigger 2003-11-20 22:21:55 Re: performance problem
Previous Message Andrei Ivanov 2003-11-20 22:10:25 Re: ERROR: nodeRead: did not find '}' [x2]