Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections

From: Sean Laurent <sean(at)studyblue(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
Date: 2011-10-11 22:38:26
Message-ID: CAK=aZ=kBGOYkxYjNppec4dTg6SocyD0NW6y-t8H57PaQrL+90Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Oct 7, 2011 at 12:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Sean Laurent <sean(at)studyblue(dot)com> writes:
> > We've been running into a particularly strange problem that I'm trying to
> > better understand. The super short version is that our application servers
> > lose their connection to the database when I run a backup during periods of
> > higher load and fail to reconnect.
>
> That's just weird.  It sounds like the "xfs_freeze" operation, or the
> snapshotting operation, is somehow interrupting network traffic.  I'd
> not expect such a thing on a normal server, but who knows what's
> connected to what in an Amazon EC2 instance?
>
> Anyway, I'd suggest trying to instrument something to prove or disprove
> that there's a networking failure involved.  It might be as simple as
> watching "ping" behavior ...

Agreed that's it very weird. EBS volumes are effectively networked
attached storage, so blaming network connectivity was my first
inclination as well. Unfortunately, it's definitely not a network
failure:

- AWS support team has not detected any network outages affecting the
EC2 instance or the EBS volumes at any time remotely near when our
outages occurred.
- I can consistently ping the database instance from the application
servers while the problem is occurring.
- I can SSH into the database instance and access Postgres while the
problem is occurring.

--
Sean Laurent
Director of Operations
StudyBlue, Inc.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joe Abbate 2011-10-11 22:40:44 Re: how to save primary key constraints
Previous Message Harvey, Allan AC 2011-10-11 22:08:19 Re: Should casting to integer produce same result as trunc()