Re: ERROR: could not open relation

From: "Thomas F(dot) O'Connell" <tfo(at)alumni(dot)brown(dot)edu>
To: PgSQL General <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:59:39
Message-ID: BEB4C1F9-FEDA-4ACF-9459-F7AB460FF60F@alumni.brown.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

One other detail: pg_autovacuum is running on this system.

I just noticed this from Tom's "Autovacuum loose ends" post from
earlier today:

"The code does not make a provision to ignore temporary tables.
Although vacuum.c and analyze.c will disregard the request to touch
such tables, it'd probably be better to recognize the situation further
upstream. In particular it seems that autovacuum will continually throw
ANALYZE requests for a temp table due to lack of stats."

Is this an issue with the current pg_autovacuum implementation? Is it
somehow involved?

Though I feel like we're a little closer to solving this mystery, I
want to target the correct vacuuming process with a fix. I have a
feeling that explicitly dropping the temp tables in the process that
also calls VACUUM prior to the VACUUM is a good short-term fix, but I
also want to know whether continuing to run pg_autovacuum with this
architecture is a bad idea. If so, we can revert to not using temp
tables at all.

Further, why have we only noticed it once when this version of code
(and PostgreSQL) has been running for weeks?

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 11:42 AM, Thomas F. O'Connell wrote:

> Sorry, I didn't have the evidence about the bgwriter before. It was
> based on conjecture on IRC last night and newly gathered evidence
> from this morning.
>
> Here's a list of current postgres processes on the box.
>
> postgres 1186 2.8 5.0 437812 417624 ? S Jul13 22:37
> postgres: writer process
> postgres 1187 0.3 0.0 5940 2688 ? S Jul13 2:54
> postgres: stats buffer process
> postgres 1188 3.1 0.1 13456 8856 ? S Jul13 25:16
> postgres: stats collector process
>
> My assumption is that it's typically the case that these three
> processes generally get allocated sequential pids when postgres
> starts.
>
> In the postgres log, we see these two types of errors, which were
> the only ones that did not report an IP address:
>
> 2005-07-12 01:53:31 CDT 13390 :LOG: statistics buffer is full
> 2005-07-13 17:44:51 CDT 13389 :ERROR: could not open relation
> 1663/32019395/94144936: No such file or directory
>
> So if we assume that pid 13390 referred to the stats collector from
> yesterday, then presumably 13389 was the bgwriter.
>
> Unfortunately, this is a system where the interloper is superuser
> (and, yes, changing this has been a TODO). But even so, I need help
> understanding how one backend could access the temp table of
> another. Which is what brings me to vacuum or some other system
> process as a culprit.
>
> Recognizing that the application code will execute as superuser in
> postgres, here is what is happening in a session:
>
> Several temporary tables (some of which inherit from actual tables)
> are constructed.
> Data is loaded in.
> If the data includes updates, in the same session, a VACUUM is
> performed, else an ANALYZE is performed.
>
> So we know these things:
>
> 1. This import process was running.
> 2. It had started the vacuum, which occurs in the same session as
> temp tables that inherit from the table being vacuumed.
> 3. bgwriter reported an error about a missing relation file (I
> guess this is a strong suspicion more than knowledge, but we
> strongly suspect).
>
> So could this be somehow related to the fact that VACUUM, as a
> result of the inheritance relationship in the temp tables, is
> explicitly attempting to access them?
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
> On Jul 14, 2005, at 11:07 AM, Tom Lane wrote:
>
>
>> Do you have some evidence that the bgwriter was what was reporting
>> the
>> error? You didn't say that before.
>>
>> The bgwriter only works on dirty shared buffers, so the only way this
>> could be happening is if a page of a temp table had gotten loaded
>> into
>> a shared buffer, which isn't supposed to happen really. Is it
>> possible
>> that you had some backend deliberately trying to read a temp table
>> created by another backend? (You don't have to assume that the
>> interloper tried to modify the table; a mere SELECT could have
>> created
>> the dirty-buffer condition due to hint-bit update. You do have to
>> assume that the interloper was superuser, though, else permissions
>> would have stopped him from accessing someone else's temp table.)
>>
>> regards, tom lane
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-07-14 17:18:44 Re: strange error with temp table: pg_type_typname_nsp_index
Previous Message Tom Lane 2005-07-14 16:53:53 Re: getting the ranks out of items with SHARED