Re: Logical decoding client has the power to crash the server

From: Meel Velliste <meel(at)fivetran(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logical decoding client has the power to crash the server
Date: 2017-09-21 04:09:37
Message-ID: CADCgt-JMrKycAcrKH9Dw6n-C5LZZUBJOZVfEU_ea3yZo9RLptA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Michael,

Thank you, I appreciate your response. Now that you mention, I am realizing
that I don't really care about dropping the oldest log entries. Mandatory
monitoring makes a lot of sense and dropping the entire slot would be
perfect when it consumes too much space.

The only problem with monitoring is that I may have no control over it. My
use case is complicated by the fact that there are three parties:
1) Our customer who has admin privileges on the database
2) Us with limited privileges
3) The database hosting provider who restricts access to the underlying OS
and file system

In this situation, neither us, nor our customer has the power to install
the required monitoring of pg_xlog. The database hosting provider would
have to do it. In most cases (e.g. Amazon RDS) the hosting provider does
provide a way of monitoring overall disk usage, which may be good enough.
But I am thinking it would make sense for postgres to have default,
built-in monitoring that drops all the slots when pg_xlog gets too full
(based on some configurable limit). Otherwise everybody has to build their
own monitoring and I imagine 99% of them would want the same behavior.
Nobody wants their database to fail just because some client was not
reading the slot.

In our case, if we lose access to the customer's database, if they did not
install monitoring (even though we told them to), their disk will fill up
and they will blame us for crashing their database. It ends up being a
classic case of finger pointing between multiple parties. This has not
happened yet but I am sure it is just a matter of time. I would really like
to see a default, built-in circuit breaker in postgres to prevent this.

Another bit of context here is that the logical decoding is of secondary
importance to our customers, but their postgres database itself is
absolutely mission critical.

Thanks,

Meel

On Wed, Sep 20, 2017 at 12:43 AM Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Wed, Sep 20, 2017 at 3:14 PM, Meel Velliste <meel(at)fivetran(dot)com> wrote:
> > From what I understand about logical decoding, there is no limit to how
> many
> > log entries will be retained by the server if nobody reads them from the
> > logical slot. This means that a client that fails to read from the slot
> has
> > the power to bring down the master database because the server's disk
> will
> > get full at which point all subsequent write operations will fail and
> even
> > read operations will fail because they too need temporary space. Even the
> > underlying operating system may be affected as it too may need temporary
> > disk space to carry out its basic functions.
>
> Monitoring is a mandatory part of the handling of replication slots.
> One possible solution is to use a background worker that scans slots
> causing bloat in pg_xlog and to automatically get rid of them so as
> the primary is preserved from any crash. Note that advancing a slot is
> doable for a physical slot, but advancing a logical slot is trickier
> (not sure if that's doable actually but Andres can comment on that)
> because it involves being sure that the catalog_xmin is still
> preserved so as past logical changes can be looked at consistently.
> --
> Michael
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2017-09-21 04:33:29 Re: Logical decoding client has the power to crash the server
Previous Message Michael Paquier 2017-09-21 01:48:45 Re: VM-Ware Backup of VM safe?