From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Анна Крханбарова <annkpx(at)yandex-team(dot)ru>, Dmitriy Sarafannikov <dsarafan(at)yandex-team(dot)ru> |
Subject: | Re: Logging corruption error codes |
Date: | 2019-06-21 10:22:15 |
Message-ID: | 28DE958A-DB3B-4266-B960-596B0092FF8E@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> написал(а):
>
> On 2019-Jun-20, Andrey Borodin wrote:
>
>> Hi!
>>
>> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error code.
>> This makes automatic log analyzer way too smart program.
>> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data.
>> PFA patch with cases that we have found in logs and consider evidence of corruption.
>
> This is not totally insane -- other similar messages such as 'corrupted
> page pointers' in bufpage.c get the same errcode.
On master there is only
elog(ERROR, "incorrect index offsets supplied");
in bufpage.c. But this indicate misuse, not corrupted data on disk.
Others already use ERRCODE_DATA_CORRUPTED.
>
> I would like to have a separate marking for messages indicating a
> system-level permanent problem rather than user error ("table/column X
> does not exist"), retryable condition ("serializability violation"), or
> resource exhaustion ("out of memory", "too many clients"),
Good idea, but there must be standards to which we comply?
> but that's
> probably a separate patch: things like "could not open/read/write file"
> for a data file, or "xlog flush request XYZ not satisfied", and so on,
> which also indicate a kind of corruption.
I believe we should not report hardware problems as corruption. But this worries us (YC) too. Do you think that this problem deserve a patch?
If we introduce new error code - this is, kind of, new feature. Should I send it to pgsql-hackers?
> As you say, currently we have
> to have much too smart programs to weed out the serious errors that
> ought to show up in an alerting system from run-of-the-mill problems.
Thanks!
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Juan José Santamaría Flecha | 2019-06-21 10:23:21 | Re: BUG #15789: libpq compilation with OpenSSL 1.1.1b fails on Windows with Visual Studio 2017 |
Previous Message | Pavel Stehule | 2019-06-21 08:43:57 | Re: segfault during SELECT using && ANY (ARRAY[NULL]::BOX2D). |