Re: update 600000 rows

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: okparanoid(at)free(dot)fr
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: update 600000 rows
Date: 2007-12-15 01:16:56
Message-ID: 47632B08.50607@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

okparanoid(at)free(dot)fr wrote:
> Hello
>
> i have a python script to update 600000 rows to one table from a csv file in my
> postgres database and it takes me 5 hours to do the transaction...
>
>
Let's see if I guessed correctly.

Your Python script is stepping through a 600,000 row file and updating
information in a table (of unknown rows/columns) by making 600,000
individual updates all wrapped in a big transaction. If correct, that
means you are doing 600,000/(3,600 * 5) = 33 queries/second. If this is
correct, I'd first investigate simply loading the csv data into a
temporary table, creating appropriate indexes, and running a single
query to update your other table.

> First when i run htop i see that the memory used is never more than 150 MB.
> I don't understand in this case why setting shmall and shmmax kernel's
> parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
> the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
>
Are you saying that you did this and the performance improved or you are
wondering if it would?

> The script is run with only one transaction and pause by moment to let the time
> to postgres to write data to disk.
>
This doesn't make sense. If the transaction completes successfully then
PostgreSQL has committed the data to disk (unless you have done
something non-standard and not recommended like turning off fsync). If
you are adding pauses between updates, don't do that - it will only slow
you down. If the full transaction doesn't complete, all updates will be
thrown away anyway and if it does complete then they were committed.
> If the data were writed at the end of the transaction will be the perfomance
> better ? i wan't that in production data regulary writed to disk to prevent
> loosinf of data but it there any interest to write temporary data in disk in a
> middle of a transaction ???
>
>
See above. Actual disk IO is handled by the server. PostgreSQL is good
at the "D" in ACID. If your transaction completes, the data has been
written to disk. Guaranteed.

Cheers,
Steve

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Davis 2007-12-15 03:53:14 Re: explanation for seeks in VACUUM
Previous Message okparanoid 2007-12-15 00:11:22 update 600000 rows