Re: Chronic performance issue with Replication Failover and FSM.

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-08-30 07:54:38
Message-ID: CAAZKuFYwLBKrkq85xuT4O_M-5UWjr0WavxnVN6enuf7=cLax7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> 4. On a high-UPDATE workload, this means that the replica assumes tables
> have no free space until it starts to build a new FSM or autovacuum
> kicks in on some of the tables, much later on.
>
> 5. If your hosting is such that you fail over a lot (such as on AWS),
> then this causes cumulative table bloat which can only be cured by a
> VACUUM FULL.

I'd like to revive this thread. Like other people, I thought this was
not a huge problem -- or least maybe not directly from the mechanism
proposed -- but sometimes it's a pretty enormous one, and I've started
to notice it. I did a bug report here
(http://archives.postgresql.org/pgsql-bugs/2012-08/msg00108.php, plots
in http://archives.postgresql.org/pgsql-performance/2012-08/msg00181.php),
but just today we promoted another system via streaming replication to
pick up the planner fix in 9.1.5 (did you know: that planner bug seems
to make GIN FTS indexes un-used in non-exotic cases, and one goes to
seqscan?), and then a 40MB GIN index bloated to two gigs on a 1.5GB
table over the course of maybe six hours.

In addition, the thread on pgsql-performance that has the plot I
linked to indicates someone having the same problem with 8.3 after a
warm-standby promotion.

So I think there are some devils at work here, and I am not even sure
if they are hard to reproduce -- yet, people use standby promotion
("unfollow") on Heroku all the time and we have not been plagued
mightily by support issues involving such incredible bloating, so
there's something about the access pattern. In my two cases, there is
a significant number of UPDATEs vs actual number of INSERTs/DELETES of
records (the ratio is probably 10000+ to 1), even though neither of
these would be close to what one could consider a large or even
medium-sized database in terms of TPS or database size. In fact, the
latter system bloated even though it comfortably fits entirely in
memory.

--
fdr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-08-30 09:02:09 Re: Draft release notes complete
Previous Message Kyotaro HORIGUCHI 2012-08-30 07:05:49 Re: Skip checkpoint on promoting from streaming replication