Skip site navigation (1) Skip section navigation (2)

slow table updates

From: Reece Hart <rkh(at)gene(dot)COM>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>,"pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>,SF PostgreSQL <sfpug(at)postgresql(dot)org>
Subject: slow table updates
Date: 2003-07-23 00:40:01
Message-ID: 1058920801.7281.67.camel@tallac (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-performancesfpug
I'm trying to update a table but it's taking a very long time. I would
appreciate any tips folks may have about ways to speed it up. 

The table is paprospect2, as below:

        \d paprospect2
           Column    |  Type   |                             Modifiers
        -------------+---------+-------------------------------------------------------------------
         pfeature_id | integer | not null default nextval('unison.pfeature_pfeature_id_seq'::text)
         pseq_id     | integer | not null
         pftype_id   | integer | not null
         start       | integer |
         stop        | integer |
         confidence  | real    |
         run_id      | integer | not null
         [snip 13 integer and real columns]
         run_id_new  | integer |
        
        Indexes: paprospect2_redundant_alignment unique btree (pseq_id, "start", stop, run_id, pmodel_id),
                 p2thread_p2params_id btree (run_id),
                 p2thread_pmodel_id btree (pmodel_id)
        Foreign Key constraints: pftype_id_exists FOREIGN KEY (pftype_id) REFERENCES pftype(pftype_id) ON UPDATE CASCADE ON DELETE CASCADE,
                                 p2thread_pmodel_id_exists FOREIGN KEY (pmodel_id) REFERENCES pmprospect2(pmodel_id) ON UPDATE CASCADE ON DELETE CASCADE,
                                 pseq_id_exists FOREIGN KEY (pseq_id) REFERENCES pseq(pseq_id) ON UPDATE CASCADE ON DELETE CASCADE
        Triggers: p2thread_i_trigger
        

The columns pfeature_id..confidence and run_id_new (in red) are from an
inherited table. Although the inheritance itself is probably not
relevant here (correction welcome), I suspect it may be relevant that
all existing rows were written before the table definition included
run_id_new. p2thread_i_trigger is defined fires on insert only (not
update).

paprospect2 contains ~40M rows. The goal now is to migrate the data to
the supertable-inherited column with

        update paprospect2 set run_id_new=run_id;


The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB
RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2
and the update's using ~3-5% of the CPU.

        $ ps -ostime,time,pcpu,cmd 28701
        STIME     TIME %CPU CMD
        12:18 00:07:19  2.3 postgres: admin csb 128.137.116.213 UPDATE

This suggests that the update is I/O bound (duh) and vmstat supports
this:

        $ vmstat 1
           procs                      memory    swap          io     system         cpu
         r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
         0  1  0      0  11288  94632 3558960   0   0    14     6   12    21   1   0   6
         0  1  0      0  12044  94632 3558956   0   0     0   972  332    16   0   1  99
         0  1  0      0  11092  94632 3558932   0   0    16  4420  309    25   0   2  97
         0  1  0      0  11456  94636 3558928   0   0     0   980  326    23   0   1  99
         1  0  0      0  12340  94636 3558924   0   0    16   532  329    14   0   0 100
         0  1  0      0  12300  94636 3558916   0   0     0  1376  324    16   1   0  99
         0  1  0      0  12252  94636 3558904   0   0    16  1888  325    18   0   0  99
         0  1  0      0  11452  94636 3558888   0   0    16  2864  324    23   1   1  98
         0  1  0      0  12172  94636 3558884   0   0     0   940  320    12   0   1  99
         0  1  0      0  12180  94636 3558872   0   0    16  1840  318    22   0   1  99
         0  1  0      0  11588  94636 3558856   0   0     0  2752  312    16   1   2  97


Presumably the large number of blocks written (bo) versus blocks read
(bi) reflects an enormous amount of bookkeeping that has to be done for
MVCC, logging, perhaps rewriting a row for the new definition (a guess
-- I don't know how this is handled), indicies, etc. There's no swapping
and no processes are waiting. In short, it seems that this is ENTIRELY
an I/O issue. Obviously, faster drives will help (but probably only by
small factor).

Any ideas how I might speed this up? Presumably this is all getting
wrapped in a transaction -- does that hurt me for such a large update?
 
Thanks,
Reece


Bonus diversionary topic: In case it's not obvious, the motivation for
this is that the subtable (paprospect2) contains a column (run_id) whose
definition I would like to migrate to the inherited table (i.e., the
'super-table'). Although postgresql permits adding a column to a
supertable with the same name as an extant column in a subtable, it
appears that such "merged definition" columns do not have the same
properties as a typical inherited column. In particular, dropping the
column from the supertable does not drop it from the subtable (but
renaming it does change both names). Hmm.

-- 
Reece Hart, Ph.D.                       rkh(at)gene(dot)com, http://www.gene.com/
Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990     reece(at)in-machina(dot)com, GPG: 0x25EC91A0

Responses

pgsql-admin by date

Next:From: Bruce MomjianDate: 2003-07-23 04:24:29
Subject: Re: using ssl some of the time
Previous:From: Charles HornbergerDate: 2003-07-22 23:09:43
Subject: using ssl some of the time

sfpug by date

Next:From: Richard HuxtonDate: 2003-07-23 08:49:09
Subject: Re: slow table updates
Previous:From: Josh BerkusDate: 2003-07-21 23:08:27
Subject: Need business user quotes for 7.4

pgsql-performance by date

Next:From: Rod TaylorDate: 2003-07-23 00:53:56
Subject: Re: One table or many tables for data set
Previous:From: Castle, LindsayDate: 2003-07-23 00:34:41
Subject: One table or many tables for data set

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group