Re: Mysterious DB reset

From: Thom Brown <thom(at)linux(dot)com>
To: Israel Brewster <israel(at)eraalaska(dot)net>
Cc: PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mysterious DB reset
Date: 2014-03-05 19:01:54
Message-ID: CAA-aLv5qD1ZgoWS8U3wT=6YAjS-p_3zDxRdXbz+WpJegxGnWWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 5 March 2014 18:22, Israel Brewster <israel(at)eraalaska(dot)net> wrote:

> I have a Postgresql 9.2.4 database containing real-time tracking data for
> our aircraft for the past week (theoretically). It is populated by two
> different processes: one that runs every few minutes, retrieving data from
> a number of sources and storing it in the DB, and one that has an "always
> on" connection to the DB streaming data into the database in realtime
> (often several records per second). To keep the database size manageable I
> have a cron job that runs every evening to delete all records that are more
> than a week old, after archiving a subset of them in permanent storage.
>
> This morning my boss e-mailed me, complaining that only a couple of
> aircraft were showing up in the list (SELECT distinct(tail) FROM data being
> the command that populates the list). Upon looking at the data I saw that
> it only went back to 4am this morning, rather than the week I was
> expecting. My first thought was "Oh, I must have a typo in my cleanup
> routine, such that it is deleting all records rather than only those a week
> old, and it's just that no one has noticed until now". So I looked at that,
> but changing the delete to a select appeared to produce the proper results,
> in that no records were selected:
>
> DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7
> days';
>
> Then I noticed something even more odd. My database has an id column,
> which is defined as a SERIAL. As we all know, a serial is a monotonically
> increasing number that is not affected by deletes. However, the oldest
> record in my database, from 4am this morning, had an id of 1. Even though I
> KNOW there was data in the system yesterday. Even if my DELETE command was
> wrong and deleted ALL records, that shouldn't have reset the SERIAL column
> to 1! I also know that I have not been in the database mucking around with
> the sequence value - to be completely honest, I don't even know the exact
> command to reset it - I'd have to google it if I wanted to.
>
> Also odd is that my cleanup script runs at 1am. I have records of there
> being new data in the database up to 3:51am, but the oldest record
> currently in the DB is from 4:45am (as specified by the default of now() on
> the column). So I know records were added after my delete command ran, but
> before this reset occurred.
>
> So my question is, aside from someone going in and mucking about in the
> wee hours of the morning, what could possibly cause this behavior? What
> sort of event could cause all data to be deleted from the table, and the
> sequence to be reset? Especially while there is an active connection?
> Thanks for any ideas, however wild or off the wall :-)
>

That is odd. Even if it were an unlogged table, and there was a crash, the
sequence wouldn't reset, and even if it was running in a very long-running
transaction held open by a buggy connection pooler, the sequence would
still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I
can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the
original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug
that's been fixed which could be responsible for this issue.

--
Thom

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brent Wood 2014-03-05 19:26:14 Re: Mysterious DB reset
Previous Message Alvaro Herrera 2014-03-05 19:01:50 Re: Mysterious DB reset