Re: Better handling of archive_command problems

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better handling of archive_command problems
Date: 2013-05-17 00:43:51
Message-ID: CA+Tgmobu5AkOoDv4iSkPd4-+jZ_+j74rvArQz2=yqQPyCvzDpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, May 16, 2013 at 11:16 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Well, I think it IS a Postgres precept that interrupts should get a
>> timely response. You don't have to agree, but I think that's
>> important.
>
> Well, yes, but the fact of the matter is that it is taking high single
> digit numbers of seconds to get a response at times, so I don't think
> that there is any reasonable expectation that that be almost
> instantaneous. I don't want to make that worse, but then it might be
> worth it in order to ameliorate a particular pain point for users.

At times, like when the system is under really heavy load? Or at
times, like depending on what the backend is doing? We can't do a
whole lot about the fact that it's possible to beat a system to death
so that, at the OS level, it stops responding. Linux is unfriendly
enough to put processes into non-interruptible kernel wait states when
they're waiting on the disk, a decision that I suspect to have been
made by a sadomasochist. But if there are times when a system that is
not responding to cancels in under a second when not particularly
heavily loaded, I would consider that a bug, and we should fix it.

>>> There is a setting called zero_damaged_pages, and enabling it causes
>>> data loss. I've seen cases where it was enabled within postgresql.conf
>>> for years.
>>
>> That is both true and bad, but it is not a reason to do more bad things.
>
> I don't think it's bad. I think that we shouldn't be paternalistic
> towards our users. If anyone enables a setting like zero_damaged_pages
> (or, say, wal_write_throttle) within their postgresql.conf
> indefinitely for no good reason, then they're incompetent. End of
> story.

That's a pretty user-hostile attitude. Configuration mistakes are a
very common user error. If those configuration hose the system, users
expect to be able to change them back, hit reload, and get things back
on track. But you're proposing a GUC that, if set to a bad value,
will very plausibly cause the entire system to freeze up in such a way
that it won't respond to a reload request - or for that matter a fast
shutdown request. I think that's 100% unacceptable. Despite what you
seem to think, we've put a lot of work into ensuring interruptibility,
and it does not make sense to abandon that principle for this or any
other feature.

> Would you feel better about it if the setting had a time-out? Say, the
> user had to explicitly re-enable it after one hour at the most?

No, but I'd feel better about it if you figured out a way avoid
creating a scenario where it might lock up the entire database
cluster. I am convinced that it is possible to avoid that, and that
without that this is not a feature worthy of being included in
PostgreSQL. Yeah, it's more work that way. But that's the difference
between "a quick hack that is useful in our shop" and "a
production-quality feature ready for a general audience".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Liming Hu 2013-05-17 00:47:33 Fwd: request a new feature in fuzzystrmatch
Previous Message Liming Hu 2013-05-17 00:41:24 request a new feature in fuzzystrmatch