Skip site navigation (1) Skip section navigation (2)

Wide area replication postgres 9.1.6 slon 2.1.2 large table failure.

From: Tory M Blue <tmblue(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Wide area replication postgres 9.1.6 slon 2.1.2 large table failure.
Date: 2013-01-12 05:49:21
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
So I started this thread on the slon forum, and they mentioned that I/we
should ask here.

Postgres 9.1.4 slon 2.1.1
Postgres 9.1.6 slon 2.1.2


Node 1, is on gig circut and is the master  (West Coast)

Node 2, is also on a gig circuit and is the slave (Georgia)

Symptoms, slon immediately dies after transferring the biggest table in the
set (this happens with 2 of 3 sets, the set that actually completes has no
large tables).

Set 1 has a table that takes just under 6000 seconds, and set 2 has a table
that takes double that, and again it completes.

1224459-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: 5760.913
seconds to copy table "cls"."listings"
1224560-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: copy table
1224642-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: Begin COPY of
table "cls"."customers"
1224733-2013-01-11 14:21:10 PST ERROR  remoteWorkerThread_1: "select
"_admissioncls".copyFields(8);"  <--- this has the proper data
1224827:2013-01-11 14:21:10 PST WARN   remoteWorkerThread_1: data copy for
set 1 failed 1 times - sleep 15 seconds

Now in terms of postgres, if I do a copy from node 1 to node 2 the large
table (<2 hors) completes without issue.

From Node 2:
-bash-4.1$ psql -h idb02 -d admissionclsdb -c "copy cls.listings to stdout"
| wc
     4199441 600742784 6621887401

This worked fine.

I get no errors in the postgres logs, there is no network disconnect and
since I can do a copy over the wire that completes, I'm at a loss.  I don't
know what to look at, what to look for or what to do.  Obviously this is
the wrong place to slon issues.

One of the slon developers stated;
"I wonder if there's something here that should get bounced over to
pgsql-hackers or such; we're poking at a scenario here where the use
of COPY to stream data between systems is proving troublesome, and
perhaps there may be meaningful opinions over there on that."

If a copy of the same table that seems to be at the end of a slon failed
attempt and it will complete with a copy, I'm just not sure what is going

Any suggestions, please ask for more data, I can do anything to the slave
node, it's a bit tougher on the source, but I can arrange to make changes
to it if need be.

I just upgraded to 9.1.6 and slon 2.1.2 but prior tests were on 9.1.4 and
slon 2.1.1 and a mix of postgres 9.1.4 slon 2.1.1 and postgres 9.1.6 slon
2.1.1 (node 2)

The other difference is node 1 is running on Fedora12 and node 2 is running
CentOS 6.2

Thanks in advance

pgsql-hackers by date

Next:From: Amit kapilaDate: 2013-01-12 05:51:06
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Previous:From: Amit kapilaDate: 2013-01-12 03:50:10
Subject: Re: Performance Improvement by reducing WAL for Update Operation

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group