Re: Long Running Update

From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Long Running Update
Date: 2011-06-24 11:45:15
Message-ID: 4E0478CB.8080002@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

24.06.11 14:16, Harry Mantheakis написав(ла):
>
> > EXPLAIN the statement
>
> Here is the EXPLAIN result:
>
> ----------------------------------------------------------------------
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join (cost=2589312.08..16596998.47 rows=74558048 width=63)
> Hash Cond: (table_A.id = table_B.id)
> -> Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47)
> -> Hash (cost=1220472.48..1220472.48 rows=74558048 width=20)
> -> Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20)
> ----------------------------------------------------------------------
>
> The documentation says the 'cost' numbers are 'units of disk page
> fetches'.
>
> Do you, by any chance, have any notion of how many disk page fetches
> can be processed per second in practice - at least a rough idea?
>
> IOW how do I convert - guesstimate! - these numbers into (plausible)
> time values?
No chance. This are "virtual values" for planner only.
If I read correctly, your query should go into two phases: build hash
map on one table, then update second table using the map. Not that this
all valid unless you have any constraints (including foreign checks,
both sides) to check on any field of updated table. If you have, you'd
better drop them.
Anyway, this is two seq. scans. For a long query I am using a tool like
ktrace (freebsd) to get system read/write calls backend is doing. Then
with catalog tables you can map file names to relations
(tables/indexes). Then you can see which stage you are on and how fast
is it doing.
Note that partially cached tables are awful (in FreeBSD, dunno for
linux) for such a query - I suppose this is because instead on
sequential read, you get a lot of random reads that fools prefetch
logic. "dd if=table_file of=/dev/null bs=8m" helps me a lot. You can see
it it helps if CPU time goes up.

Best regards, Vitalii Tymchyshyn

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Harry Mantheakis 2011-06-24 12:39:16 Re: Long Running Update
Previous Message Claudio Freire 2011-06-24 11:39:54 Re: Long Running Update