Re: about multiprocessingmassdata

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: about multiprocessingmassdata
Date: 2012-04-04 16:34:47
Message-ID: 4F7C7827.7080705@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 4.4.2012 17:52, superman0920 wrote:
> Hi list!
>
> i have a table which has 8500000 rows records. i write a java program to
> update these records.
> i use 100 threads to update the records. For example, thread-1 update
> 1~85000 records; thread-2 update 85001~170000 and so on.
> The update sql's aim is remove the space in the column and it is simple:
> update poi set py=replace(py,' ','') where id=?;

That's a very naive approach. It's very likely each thread will do an
index scan for each update (to evaluate the 'id=?' condition. And that's
going to cost you much more than you gain because index scans are quite
CPU and I/O intensive.

Simply update the whole table by

UPDATE poi SET py = replace(py, ' ','');

Have you actually tried how this performs or did you guess 'it's
definitely going to be very slow so I'll use multiple threads to make
that faster'?

If you really need to parallelize this, you need to do that differently
- e.g. use 'ctid' to skip to update a whole page like this:

UPDATE poi SET py = replace(py, ' ','')
WHERE ctid >= '(n,0)'::tid AND ctid < '(n+1,0)'::tid AND;

where 'n' ranges between 0 and number of pages the table (e.g. in pg_class).

But try the simple UPDATE first, my guess is it's going to be much
faster than you expect.

Tomas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2012-04-04 16:42:14 Re: H800 + md1200 Performance problem
Previous Message Scott Marlowe 2012-04-04 16:31:28 Re: H800 + md1200 Performance problem