Re: update 600000 rows

From: "H(dot) Hall" <hhall1001(at)reedyriver(dot)com>
To: okparanoid(at)free(dot)fr
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: update 600000 rows
Date: 2007-12-17 17:53:48
Message-ID: 4766B7AC.8080303@reedyriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Note: I am resending this because the first never appeared after 40hrs.
HH

okparanoid(at)free(dot)fr wrote:
> Hello
>
> i have a python script to update 600000 rows to one table from a csv file in my
> postgres database and it takes me 5 hours to do the transaction...
>
> I'm on debian etch with 8.1 postgres server on a 64 bits quad bi opteron.
>
> I have desactived all index except the primary key who is not updated since it's
> the reference column of the update too.
>
> When i run this script the server is not used by any other user.
>
> First when i run htop i see that the memory used is never more than 150 MB.
> I don't understand in this case why setting shmall and shmmax kernel's
> parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
> the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
>
> The script is run with only one transaction and pause by moment to let the time
> to postgres to write data to disk.
>
> If the data were writed at the end of the transaction will be the perfomance
> better ? i wan't that in production data regulary writed to disk to prevent
> loosinf of data but it there any interest to write temporary data in disk in a
> middle of a transaction ???
>
> I'm completely noob to postgres and database configuration and help are
> welcome.
>
> thank

You will get a huge improvement in time if you use batch updates instead
of updating a row at a time. See:

http://www.postgresql.org/docs/8.2/interactive/populate.html
and
http://www.postgresql.org/docs/8.2/interactive/sql-begin.html

You will also get a big improvement if you can turn fsync off during the
update. See:
http://www.postgresql.org/docs/8.2/interactive/runtime-config-wal.html

You also need to vacuum the table after doing that many updates since pg
does a delete and insert on each update, there will be a lot of holes.

Cheers
HH

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2007-12-17 19:35:18 Multi-threading friendliness (was: libgcc double-free, backend won't die)
Previous Message Joshua D. Drake 2007-12-17 16:13:43 Re: viewing source code