Re: Running update in chunks?

From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Running update in chunks?
Date: 2013-01-25 08:53:01
Message-ID: CAGuHJrMMD_2uQ+wQUwAOUyrp88Vh_O5S63oD=EKk+mvCpghxjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sorry I haven't been responsive for a little while.

I ran your script but creating a new schema instead of my existing
schema. My timings were similar to yours (more or less) except fo the
vacuums which took roughly 147891 ms.

On Tue, Jan 22, 2013 at 3:21 PM, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:
> Kevin Grittner wrote:
>
>> update imports set make_id = 0
>>
>> Query returned successfully: 98834 rows affected, 45860 ms execution time.
>
> For difficult problems, there is nothing like a self-contained test
> case, that someone else can run to see the issue. Here's a starting
> point:
>
> create extension if not exists hstore;
> create schema cars;
> drop table if exists cars.imports;
> CREATE TABLE cars.imports
> (
> id serial NOT NULL,
> target_id integer,
> batch_id integer,
> make_id integer,
> model_id integer,
> date timestamp without time zone,
> division_id integer,
> dealer_id integer,
> data hstore,
> created_at timestamp without time zone NOT NULL,
> updated_at timestamp without time zone NOT NULL,
> CONSTRAINT imports_pkey PRIMARY KEY (id)
> );
> CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
> CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id);
> CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id);
> CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id);
> CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id);
> insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 100000)) x;
> vacuum freeze analyze;
> \timing on
> update cars.imports set make_id = 0;
> vacuum analyze;
> update cars.imports set make_id = 0;
>
> ... and here's what I get when I run it on my desktop computer with
> ordinary disk drives and a completely default configuration:
>
> test=# create extension if not exists hstore;
> CREATE EXTENSION
> Time: 48.032 ms
> test=# create schema cars;
> CREATE SCHEMA
> Time: 8.150 ms
> test=# drop table if exists cars.imports;
> NOTICE: table "imports" does not exist, skipping
> DROP TABLE
> Time: 0.205 ms
> test=# CREATE TABLE cars.imports
> test-# (
> test(# id serial NOT NULL,
> test(# target_id integer,
> test(# batch_id integer,
> test(# make_id integer,
> test(# model_id integer,
> test(# date timestamp without time zone,
> test(# division_id integer,
> test(# dealer_id integer,
> test(# data hstore,
> test(# created_at timestamp without time zone NOT NULL,
> test(# updated_at timestamp without time zone NOT NULL,
> test(# CONSTRAINT imports_pkey PRIMARY KEY (id)
> test(# );
> NOTICE: CREATE TABLE will create implicit sequence "imports_id_seq" for serial column "imports.id"
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "imports_pkey" for table "imports"
> CREATE TABLE
> Time: 152.677 ms
> test=# CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
> CREATE INDEX
> Time: 6.391 ms
> test=# CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id);
> CREATE INDEX
> Time: 64.668 ms
> test=# CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id);
> CREATE INDEX
> Time: 65.573 ms
> test=# CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id);
> CREATE INDEX
> Time: 64.959 ms
> test=# CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id);
> CREATE INDEX
> Time: 64.906 ms
> test=# insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 100000)) x;
> INSERT 0 100000
> Time: 2516.559 ms
> test=# vacuum freeze analyze;
> VACUUM
> Time: 3357.778 ms
> test=# \timing on
> Timing is on.
> test=# update cars.imports set make_id = 0;
> UPDATE 100000
> Time: 2937.241 ms
> test=# vacuum analyze;
> VACUUM
> Time: 2097.426 ms
> test=# update cars.imports set make_id = 0;
> UPDATE 100000
> Time: 3935.939 ms
>
> Ubuntu 12.10
> i7-3770 CPU @ 3.40GHz with 16GB RAM
> Linux Kevin-Desktop 3.5.0-22-generic #34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
> A pair of WDC WD10EALX-009BA0 (15.01H51) 7200 RPM drives in RAID 1.
>
> PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit
>
> test=# SELECT name, current_setting(name), source
> test-# FROM pg_settings
> test-# WHERE source NOT IN ('default', 'override');
> name | current_setting | source
> ----------------------------+--------------------+----------------------
> application_name | psql | client
> client_encoding | UTF8 | client
> DateStyle | ISO, MDY | configuration file
> default_text_search_config | pg_catalog.english | configuration file
> lc_messages | en_US.UTF-8 | configuration file
> lc_monetary | en_US.UTF-8 | configuration file
> lc_numeric | en_US.UTF-8 | configuration file
> lc_time | en_US.UTF-8 | configuration file
> log_timezone | US/Central | configuration file
> max_connections | 100 | configuration file
> max_stack_depth | 2MB | environment variable
> shared_buffers | 32MB | configuration file
> TimeZone | US/Central | configuration file
> (13 rows)
>
> I did absolutely no tuning from the default configuration.
>
> So, what timings do you get if you run the identical script? Is
> there something you can do to the above script (maybe in terms of
> populating data) which will cause the performance you see?
>
> -Kevin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tim Uckun 2013-01-25 08:57:31 Re: Running update in chunks?
Previous Message Bruno Wolff III 2013-01-25 05:29:06 Re: noobie question