Re: Orphaned files in base/[oid]

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Orphaned files in base/[oid]
Date: 2017-08-15 14:03:41
Message-ID: CAN-RpxAAjKwCMb6iQ7uvOReZOXqTzPs3jT_3zo1xmWXBerWM7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 15, 2017 at 3:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Chris Travers <chris(dot)travers(at)adjust(dot)com> writes:
> > I wonder about a different solution. Would it be possible to special
> case
> > vacuum to check for and remove (or just move to where they can be
> removed)
> > files when vacuuming pg_class? At the point we are vacuuming pg_class,
> we
> > ought to be able to know that a relfilenode shouldn't be used anymore,
> > right?
>
> I don't think so. It's not clear to me whether you have in mind "scan
> pg_class, collect relfilenodes from all live tuples, then zap all files
> not in that set" or "when removing a dead tuple, zap the relfilenode
> it mentions". But neither one works. The first case has a race condition
> against new pg_class entries. As for the second, the existence of a dead
> tuple bearing relfilenode N isn't evidence that some other live tuple
> can't have relfilenode N.
>

Ah because if the file never made it on to disk the number could be
re-used.

>
> Another problem for the second solution is that in the case you're worried
> about (ie, PANIC due to out-of-WAL-space during relation's creating
> transaction), there's no very good reason to expect that the relation's
> pg_class tuple ever made it to disk at all.
>
> A traditional low-tech answer to this has been to keep the WAL on a
> separate volume from the main data store, so that it's protected from
> out-of-space conditions in the main store and temp areas. The space
> needs for WAL proper are generally much more predictable than the main
> store, so it's easier to keep the dedicated space from overflowing.
> (Stalled replication/archiving processes can be hazardous to your
> health in this scenario, though, if they prevent prompt recycling of
> WAL files.)
>

Yeah, most of our dbs here have wal on a separate volume but not this
system. This system is also unusual in that disk usage varies wildly (and
I am not 100% sure that this is the only case which causes it though I can
reproduce it consistently in the case of the wal writer running out of disk
space with symptoms exactly what I found).

So for now that leaves my fallback approach as a way to fix it when I see
it.

I have written a shell script which does as follows:
1. starts Postgres in single user mode with a data directory or dies
(won't run if Postgres seems to be already running)
2. gets the old of the current database
3. lists all files consisting of only digits in the base/[dboid] directory
4. asks Postgres (In single user mode again) for all relfilenodes and oids
of tables (In my testing both were required because there were some cases
where relfilenodes were not set in some system
5. Loops through the file nodes gathered, checks against the relfilenode
entries, and zaps $f, $f_*, and $f.*. Currently for testing "zaps" has
been to move to a lostnfound folder for inspection following the script.
The logic here is not perfect and is very slightly under inclusive, but
better that than the other way.

Then we can start Postgres again. I cannot find a better way to avoid race
conditions, I guess. At any rate it sounds like preventing the problem more
generally may be something beyond what I would feel comfortable trying to
do as a patch at my current level of familiarity with he source code.

The full script is included inline below my signature in case it is of
interest to anyone on the list.

> regards, tom lane
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com

Saarbrücker Straße 37a, 10405 Berlin

---

#!/bin/bash

datadir=$1
database=$2

pg_ctl -D $datadir stop

dboidfile="$PWD/cleanupdb.oid"
reloidfile="$PWD/refilenodes.list"

echo "COPY (select oid from pg_database where datname = current_database())
TO '$dboidfile'" | postgres --single -D $datadir $database > /dev/null

if (($?))
then
echo "FATAL: Could not start Postgres in single user mode"
exit 1
fi

dboid=`cat $dboidfile`
filenodes=`(cd test/base/$dboid; ls [0-9]*[0-9] | grep -v '\.' | sort -n)`
#echo $filenodes

echo "COPY (select relfilenode from pg_class union select oid as
relfilenode from pg_class) TO '$reloidfile'" | postgres --single -D
$datadir $database > /dev/null
relfilenodes=`cat $reloidfile`
#echo $relfilenodes
if [[ -z relfilenodes ]]
then
echo "FATAL: did not get any relfilenodes"
exit 2
fi

mkdir lostnfound;
for f in $filenodes
do
if [[ -z `echo $relfilenodes | grep -w $f` ]]
then
echo moving $f to lostnfound
mv $datadir/base/$dboid/$f lostnfound
mv $datadir/base/$dboid/${f}_* lostnfound 2> /dev/null
mv $datadir/base/$dboid/${f}.* lostnfound 2> /dev/null
fi
done
rm $dboidfile
rm $reloidfile

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-15 14:04:59 Re: [HACKERS] Replication to Postgres 10 on Windows is broken
Previous Message Peter Eisentraut 2017-08-15 14:01:44 Re: [HACKERS] Replication to Postgres 10 on Windows is broken