BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica

From: federico(at)brandwatch(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
Date: 2013-05-30 16:43:25
Message-ID: E1Ui5wj-0007tI-Cl@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8192
Logged by: Federico Campoli
Email address: federico(at)brandwatch(dot)com
PostgreSQL version: 9.2.4
Operating system: Debian 6.0
Description:

/*

Description:

It seems on very large tables the concurrent update with vacuum (or
autovacuum),
when the slave is in hot standby mode, generates long loops in read on a
single wal segment during the recovery process.

This have two nasty effects.
A massive read IO peak and the replay lag increasing as the recovery process
hangs for long periods on a pointless loop.

PostgreSQL version: PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled
by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit

Steps to reproduce the error:
setup an hot standby server.
the error occurs with streaming replication enabled and disabled

*/
SET client_min_messages='debug2';
SET trace_sort='on';

--create a new empty table
DROP TABLE IF EXISTS t_vacuum;
CREATE table t_vacuum
(
i_id_row integer,
ts_time timestamp with time zone default now()
)
;
--disable the autovacuum, we need to run it manually
ALTER TABLE t_vacuum set (autovacuum_enabled='off');

--insert into the table enough data to have multiple file nodes
INSERT INTO t_vacuum
(
i_id_row
)
SELECT * FROM generate_series(1,40000000)
;

/*
start iotop on the hot standby monitoring the postgres:startup process and
eventually the wal receiver
for example iotop -p 31293
wait for all the wal files generated during the update to be applied
this does not apply on live servers of course but is useful to trap only the
IO activity generated by vacuum

If the streaming replication is on, this select will tell when the slave has
finished.

In any case is useful to check the replication lag during the vacuum.
SELECT
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),
replay_location))
FROM
pg_stat_replication
;

*/

--in a new session and start an huge table update
UPDATE t_vacuum set ts_time=now() WHERE i_id_row<20000000;

--then vacuum the table
VACUUM VERBOSE t_vacuum;

--at some point the startup process will stuck recovering one single wal
file and
--the DISK READ column will show a huge IO for a while.
--if you monitor the wal receiver also you will notice stream will continue
without problems
--sooner or later the startup process will resume the recovery but meanwhile
the DISK READ is not zero
--the slave will accumulate replication lag

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message vlajos 2013-05-30 16:51:29 BUG #8193: A few cosmetic misspell fixes.
Previous Message gerald 2013-05-30 16:43:03 BUG #8191: Wrong bit conversion