Re: emergency outage requiring database restart

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Oskari Saarenmaa <os(at)ohmu(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2017-01-04 15:36:25
Message-ID: CAHyXU0w43BkN9R3FTSatRwGR0Qjjuy8m909yGtKtnUncrxq98Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 1:05 PM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
>> Regardless, it seems like you might be on to something, and I'm
>> inclined to patch your change, test it, and roll it out to production.
>> If it helps or at least narrows the problem down, we ought to give it
>> consideration for inclusion (unless someone else can think of a good
>> reason not to do that, heh!).
>
> Any results yet?

Not yet unfortunately. I compiled the server with the change, but was
not able get $libdir working so that I could just do a binary swap
over my pgdg compiled package. If anyone has some pointers on how to
do that, I'd appreciated it.

Still getting checksum failures. Over the last 30 days, I see the
following. Since enabling checksums FWICT none of the damage is
permanent and rolls back with the transaction. So creepy!

[root(at)rcdylsdbmpf001 pg_log]# cat *.log | grep "page verification failed"
2016-12-05 10:17:48 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 61797 but expected 61798
2016-12-05 11:15:31 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 37750 but expected 37749
2016-12-05 11:15:58 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 44483 but expected 44482
2016-12-05 11:16:33 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 58926 but expected 58925
2016-12-05 11:17:08 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 38527 but expected 38528
2016-12-05 11:18:34 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 61932 but expected 61933
2016-12-05 11:18:55 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 23757 but expected 23758
2016-12-05 12:13:48 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 44192 but expected 44225 at character 417
2016-12-08 14:18:37 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 36083 but expected 36082
2016-12-08 15:52:31 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 63414 but expected 63415 at
character 1096
2016-12-09 09:12:21 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 25781 but expected 25780
2016-12-09 09:13:20 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 63043 but expected 63044 at
character 4230
2016-12-12 08:57:45 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 31775 but expected 31771
2016-12-13 09:47:11 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 40802 but expected 40806
2016-12-15 12:49:04 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 11625 but expected 11592 at character 417
2016-12-15 12:51:08 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:52:36 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-16 12:16:31 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 23580 but expected 23576
2016-12-20 13:59:33 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 45273 but expected 45285
2016-12-20 14:00:22 CST [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 10524 but expected 10525

note second database 'mpf2'. This is a new development. Example of
query that is jacking things is this:
2016-12-15 12:51:08 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:51:08 CST [rms(at)mpf2]: CONTEXT: SQL statement "
COPY (
SELECT 'DELETE FROM tblNAptCommonSample WHERE ReportPeriod = 201612;'
UNION ALL
SELECT format(
'INSERT INTO tblNAptCommonSample('
'ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt, '
'Sample, Occupancy, OccupancyChange, AverageRent,
AverageRentChange, RentSF, '
'RentSFChange)'
'VALUES('
'%s, %s, ''%s'', ''%s'', ''%s'', %s,'
'%s, %s, %s, %s, %s, %s,'
'%s)',
ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt,
c(Sample), c(Occupancy), c(OccupancyChange), c(AverageRent),
c(AverageRentChange), c(RentSF),
c(RentSFChange))
FROM tblNAptCommonSample
WHERE Period = 201612
AND MSA != '5610'
UNION ALL
SELECT 'go'
) TO '/tmp/tblnaptcommonsample.sql';
"
PL/pgSQL function writempf1history(integer) line 75 at EXECUTE

or this:
2016-12-15 12:52:36 CST [rms(at)mpf2]: WARNING: page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-15 12:52:36 CST [rms(at)mpf2]: QUERY:
COPY (
SELECT 'DELETE FROM tbltwrexistingunits WHERE ReportPeriod = 201612;'
UNION ALL
SELECT format(
'INSERT INTO tbltwrexistingunits('
'ReportPeriod, market, submarketnum, yr_qtr, cmpltns, deletions, '
'existing, unadjexisting)'
'VALUES('
'%s, ''%s'', %s, ''%s'', %s, %s,'
'%s, %s)',
ReportPeriod, market, submarket, yr_qtr, c(cmpltns), c(deletions),
c(existing), c(unadjexisting))
FROM tbltwrexistingunits
WHERE ReportPeriod = 201612
AND market != '5610'
UNION ALL
SELECT 'go'
) TO '/tmp/tbltwrexistingunits.sql';

2016-12-15 12:52:36 CST [rms(at)mpf2]: CONTEXT: PL/pgSQL function
writempf1history(integer) line 109 at EXECUTE

This is another (much simpler) routine that:
1. writes out data to scratch file with COPY
2. uses pl/sh to pipe to sqsh

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-01-04 15:46:28 Re: [PATCH] Reload SSL certificates on SIGHUP
Previous Message Robert Haas 2017-01-04 15:33:27 Re: increasing the default WAL segment size