Problem with multi-job pg_restore

From: Brian Weaver <cmdrclueless(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Problem with multi-job pg_restore
Date: 2012-05-01 13:27:55
Message-ID: CAAhXZGvgCAiFRXyViPDcOZQ0L7f27gfsDv3-yvpnvR8_JdcKJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think I've discovered an issue with multi-job pg_restore on a 700 GB
data file created with pg_dump. Before anyone points out that the
preferred procedure is to use the newest pg_dump to backup a database
before doing pg_restore let just say, "Yes I'm aware of that advice
and unfortunately it just isn't an option."

Here is the dump file information (acquired via pg_restore -l)

; Archive created at Wed Mar 7 10:51:40 2012
; dbname: raritan
; TOC Entries: 756
; Compression: -1
; Dump Version: 1.11-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 8.4.9
; Dumped by pg_dump version: 8.4.9

The problem occurs during the restore when one of the bulk loads
(COPY) seems to get disconnected from the restore process. I captured
stdout and stderr from the pg_restore execution and there isn't a
single hint of a problem. When I look at the log file in the
$PGDATA/pg_log directory I found the following errors:

LOG: could not send data to client: Connection reset by peer
STATEMENT: COPY public.outlet_readings_rollup (id, outlet_id,
rollup_interval, reading_time, min_current, max_current,
average_current, min_active_power, max_active_power,
average_active_power, min_apparent_power, max_apparent_power,
average_apparent_power, watt_hour, pdu_id, min_voltage, max_voltage,
average_voltage) TO stdout;

I'm running PostgreSQL 9.1.3 on a CentOS 6 x86-64 build. I'm a
developer by trade so I'm good with building from the latest source
and using debugging tools as necessary. What I'm really looking for is
advice on how to maximize the information I get so that I can minimize
the number of times I have to run the restore. The restore process
takes at least a day to complete (discounting the disconnected COPY
process) and I don't have weeks to figure out what's going on.

Thanks

-- Brian
--

/* insert witty comment here */

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-05-01 13:35:18 Re: Future In-Core Replication
Previous Message Peter Geoghegan 2012-05-01 13:19:16 Re: proposal: additional error fields