slow table updates

From: Reece Hart <rkh(at)gene(dot)COM>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>, SF PostgreSQL <sfpug(at)postgresql(dot)org>
Subject: slow table updates
Date: 2003-07-23 00:40:01
Message-ID: 1058920801.7281.67.camel@tallac
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance sfpug

I'm trying to update a table but it's taking a very long time. I would
appreciate any tips folks may have about ways to speed it up.

The table is paprospect2, as below:

\d paprospect2
Column | Type | Modifiers
-------------+---------+-------------------------------------------------------------------
pfeature_id | integer | not null default nextval('unison.pfeature_pfeature_id_seq'::text)
pseq_id | integer | not null
pftype_id | integer | not null
start | integer |
stop | integer |
confidence | real |
run_id | integer | not null
[snip 13 integer and real columns]
run_id_new | integer |

Indexes: paprospect2_redundant_alignment unique btree (pseq_id, "start", stop, run_id, pmodel_id),
p2thread_p2params_id btree (run_id),
p2thread_pmodel_id btree (pmodel_id)
Foreign Key constraints: pftype_id_exists FOREIGN KEY (pftype_id) REFERENCES pftype(pftype_id) ON UPDATE CASCADE ON DELETE CASCADE,
p2thread_pmodel_id_exists FOREIGN KEY (pmodel_id) REFERENCES pmprospect2(pmodel_id) ON UPDATE CASCADE ON DELETE CASCADE,
pseq_id_exists FOREIGN KEY (pseq_id) REFERENCES pseq(pseq_id) ON UPDATE CASCADE ON DELETE CASCADE
Triggers: p2thread_i_trigger

The columns pfeature_id..confidence and run_id_new (in red) are from an
inherited table. Although the inheritance itself is probably not
relevant here (correction welcome), I suspect it may be relevant that
all existing rows were written before the table definition included
run_id_new. p2thread_i_trigger is defined fires on insert only (not
update).

paprospect2 contains ~40M rows. The goal now is to migrate the data to
the supertable-inherited column with

update paprospect2 set run_id_new=run_id;

The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB
RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2
and the update's using ~3-5% of the CPU.

$ ps -ostime,time,pcpu,cmd 28701
STIME TIME %CPU CMD
12:18 00:07:19 2.3 postgres: admin csb 128.137.116.213 UPDATE

This suggests that the update is I/O bound (duh) and vmstat supports
this:

$ vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 0 11288 94632 3558960 0 0 14 6 12 21 1 0 6
0 1 0 0 12044 94632 3558956 0 0 0 972 332 16 0 1 99
0 1 0 0 11092 94632 3558932 0 0 16 4420 309 25 0 2 97
0 1 0 0 11456 94636 3558928 0 0 0 980 326 23 0 1 99
1 0 0 0 12340 94636 3558924 0 0 16 532 329 14 0 0 100
0 1 0 0 12300 94636 3558916 0 0 0 1376 324 16 1 0 99
0 1 0 0 12252 94636 3558904 0 0 16 1888 325 18 0 0 99
0 1 0 0 11452 94636 3558888 0 0 16 2864 324 23 1 1 98
0 1 0 0 12172 94636 3558884 0 0 0 940 320 12 0 1 99
0 1 0 0 12180 94636 3558872 0 0 16 1840 318 22 0 1 99
0 1 0 0 11588 94636 3558856 0 0 0 2752 312 16 1 2 97

Presumably the large number of blocks written (bo) versus blocks read
(bi) reflects an enormous amount of bookkeeping that has to be done for
MVCC, logging, perhaps rewriting a row for the new definition (a guess
-- I don't know how this is handled), indicies, etc. There's no swapping
and no processes are waiting. In short, it seems that this is ENTIRELY
an I/O issue. Obviously, faster drives will help (but probably only by
small factor).

Any ideas how I might speed this up? Presumably this is all getting
wrapped in a transaction -- does that hurt me for such a large update?

Thanks,
Reece

Bonus diversionary topic: In case it's not obvious, the motivation for
this is that the subtable (paprospect2) contains a column (run_id) whose
definition I would like to migrate to the inherited table (i.e., the
'super-table'). Although postgresql permits adding a column to a
supertable with the same name as an extant column in a subtable, it
appears that such "merged definition" columns do not have the same
properties as a typical inherited column. In particular, dropping the
column from the supertable does not drop it from the subtable (but
renaming it does change both names). Hmm.

--
Reece Hart, Ph.D. rkh(at)gene(dot)com, http://www.gene.com/
Genentech, Inc. 650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93 http://www.in-machina.com/~reece/
South San Francisco, CA 94080-4990 reece(at)in-machina(dot)com, GPG: 0x25EC91A0

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Bruce Momjian 2003-07-23 04:24:29 Re: using ssl some of the time
Previous Message Charles Hornberger 2003-07-22 23:09:43 using ssl some of the time

Browse pgsql-performance by date

  From Date Subject
Next Message Rod Taylor 2003-07-23 00:53:56 Re: One table or many tables for data set
Previous Message Castle, Lindsay 2003-07-23 00:34:41 One table or many tables for data set

Browse sfpug by date

  From Date Subject
Next Message Richard Huxton 2003-07-23 08:49:09 Re: slow table updates
Previous Message Josh Berkus 2003-07-21 23:08:27 Need business user quotes for 7.4