Re: Massive table (500M rows) update nightmare

From: Eduardo Morras <emorras(at)s21sec(dot)com>
To: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive table (500M rows) update nightmare
Date: 2010-01-08 11:19:38
Message-ID: 20100108112431.46277558B18@s21sec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At 08:17 07/01/2010, Carlo Stonebanks wrote:
>Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not individual field values).
>
>Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this varchar(255) column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x AND id < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.)
>
>We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to complete on an 8-core CPU with more than enough memory.
>
>As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table structure (detailed below) or should I dig deeper to look for performance issues?
>
>As always, thanks!
>
>Carlo

You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last line, has the data in TSV (Tab Separate Format) plus a LF.

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

CREATE TABLE mdx_core.audit_impt
(
audit_impt_id serial NOT NULL,
impt_session integer,
impt_version character varying(255),
}

COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin;
1 1 tateti
2 32 coll

You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a tabulator+data at the end of the line (before the LF). Note that the table name is different from the original.

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE mdx_core.audit_impt2
(
audit_impt_id serial NOT NULL,
impt_session integer,
impt_version character varying(255),
source_table character varying(255)
}
COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin;
1 1 tateti tentown
1 32 coll krof

After this, add indexes, constraints as usual.

HTH

--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com

Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.

Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.

Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2010-01-08 14:11:19 Re: Massive table (500M rows) update nightmare
Previous Message Tore Halvorsen 2010-01-08 11:03:55 FusionIO performance