Re: Running update in chunks?

From: Richard Huxton <dev(at)archonet(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-21 11:12:39
Message-ID: 50FD22A7.4070203@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 21/01/13 10:30, Tim Uckun wrote:
>> Can you try a couple of things just to check timings. Probably worth EXPLAIN
>> ANALYSE.
>>
>> SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id =
>> md.id;
>
> Takes about 300 ms
>
>> CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md
>> ON i.model_id = md.id;
> Takes about 300 ms
OK - so writing all the data takes very under one second but updating
the same amount takes 50 seconds.

The only differences I can think of are WAL logging (transaction log)
and index updates (the temp table has no indexes).

1. Try CREATE TABLE rather than CREATE TEMP TABLE - if that's still
quick then it's not the time taken to write WAL.
2. Run the update query against your new tt table and see how long that
takes.
3. Add indexes and repeat (in particular I'd be suspicious of the gin
index on "data")

My guess is that it's the time taken to update the "data" index - gin
indexes can be slow to rebuild (although 50 seconds seems *very* slow).
If so there are a few options:
1. Split the table and put whatever this "data" is into an import_data
table - assuming it doesn't change often.
2. Try a fill-factor of 50% or less - keeping the updates on the same
data page as the original might help
3. Drop the gin index before doing your bulk update and rebuild it at
the end. This is a common approach with bulk-loading / updates.

Oh - I'm assuming you're only updating those rows whose id has changed -
that seemed to be the suggestion in your first message. If not, simply
adding "AND make_id <> md.make_id" should help. Also (and you may well
have considered this) - for a normalised setup you'd just have the
model-id in "imports" and look up the make-id through the "models" table.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2013-01-21 11:39:00 Re: pg_Restore
Previous Message Tim Uckun 2013-01-21 10:30:06 Re: Running update in chunks?