"could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables

From: "John Smith" <sodgodofall(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables
Date: 2008-02-29 01:54:38
Message-ID: b88f0d670802281754p5907008ahb8cac5cf5cf78dc9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Architecture: Intel Core 2 Duo
OS: linux-2.6.20-gentoo-r8
Filesystem: ext3
Postgres v8.2.3 compiled with gcc 4.1.1-r3
RAM - 2GB
Shared buffers - 24MB
[All other Postgres configuration parameters are default values]

Problem description:
COPY into temp table fails using a specific combination of
create/insert on temp tables, prepare/commit in subsequent
transactions. The "could not open relation" error occurs reliably.

Steps to reproduce:

Existing schema (scripts to create and populate these tables are
uploaded to http://upload2.net/page/download/rfsLfnuVlYjEcCJ/basetables.tgz.html
):
In the scenario, the following 4 tables exist already in the database:

postgres=# \d order_detail
Table "public.order_detail"
Column | Type | Modifiers
--------------+-----------------------------+-----------
order_id | integer | not null
item_id | integer | not null
order_date | timestamp without time zone |
emp_id | integer |
promotion_id | integer |
qty_sold | integer |
unit_price | bigint |
unit_cost | bigint |
discount | integer |
customer_id | integer |
Indexes:
"order_detail_pkey" PRIMARY KEY, btree (order_id, item_id)

postgres=# select count(*) from order_detail;
count
-------
34352
(1 row)

postgres=# \d lu_call_ctr
Table "public.lu_call_ctr"
Column | Type | Modifiers
-------------+---------------+-----------
call_ctr_id | integer | not null
center_name | character(50) |
region_id | integer |
manager_id | integer |
country_id | integer |
dist_ctr_id | bigint |
Indexes:
"lu_call_ctr_pkey" PRIMARY KEY, btree (call_ctr_id)

postgres=# select count(*) from lu_call_ctr;
count
-------
1
(1 row)

postgres=# \d lu_employee
Table "public.lu_employee"
Column | Type | Modifiers
----------------+-----------------------------+-----------
emp_id | integer | not null
emp_last_name | character(50) |
emp_first_name | character(50) |
emp_ssn | character(50) |
birth_date | timestamp without time zone |
hire_date | timestamp without time zone |
salary | integer |
country_id | integer |
dist_ctr_id | integer |
manager_id | integer |
call_ctr_id | integer |
fte_flag | character(50) |
Indexes:
"lu_employee_pkey" PRIMARY KEY, btree (emp_id)

postgres=# select count(*) from lu_employee;
count
-------
2
(1 row)

postgres=# \d city_ctr_sls
Table "public.city_ctr_sls"
Column | Type | Modifiers
------------------+---------+-----------
cust_city_id | integer | not null
call_ctr_id | integer | not null
tot_dollar_sales | integer |
tot_unit_sales | integer |
tot_cost | integer |
Indexes:
"city_ctr_sls_pkey" PRIMARY KEY, btree (cust_city_id, call_ctr_id)

postgres=# select count(*) from city_ctr_sls;
count
-------
548
(1 row)

Given the the data in these base tables, the following set of SQLs
always generates the "Could not open relation" error.
The insert*.log files that we try to COPY into Postgres in the SQLs
below are uploaded as:
http://upload2.net/page/download/gADZqQvOIntLRpI/insert.tgz.html
---------------------------------------------------------------------
-- Note: If the amount of data being inserted is decreased significantly,
-- the error disappears.

-- First transaction
begin transaction;

-- Temp table 1 and insert 1582 records
create temp table temp1
as
select
customer_id, emp_id
from
order_detail
limit 0;

copy temp1 from '/tmp/relationError/insert_1.log';

-- Create temp table 2 and populate with select.
-- Note: Even changing the order of these columns causes the error to
-- disappear.
create temp table temp2
as
select
temp1.customer_id, temp1.emp_id as temp1__emp_id, le.emp_id as le__emp_id,
le.emp_last_name, le.emp_first_name, le.emp_ssn, le.birth_date,
le.hire_date, le.salary, le.country_id, le.dist_ctr_id, le.manager_id,
le.call_ctr_id, le.fte_flag
from
temp1,
lu_employee le
where temp1.emp_id = le.emp_id;

-- Create temp table 3 and insert 13832 records.
create temp table temp3
as
select
temp2.call_ctr_id, temp2.temp1__emp_id, temp2.customer_id, temp2.le__emp_id
from
temp2
where temp2.temp1__emp_id = temp2.le__emp_id
limit 0 ;

copy temp3 from '/tmp/relationError/insert_2.log';

-- Create temp table 4 and insert 6160 records.
create temp table temp4
as
select
lcc.region_id, temp3.customer_id
from
lu_call_ctr lcc,
temp3
where temp3.call_ctr_id = lcc.call_ctr_id
group by lcc.region_id, temp3.customer_id
limit 0;

copy temp4 from '/tmp/relationError/insert_3.log';

-- Drop the temp tables
drop table temp1 ;
drop table temp2 ;
drop table temp3 ;
drop table temp4 ;

-- 2PC
-- Note: Replace the prepare/commit pair with just a simple commit; and
-- the error goes away.
prepare transaction '260';
commit prepared '260' ;

--Next transaction
begin transaction;

-- Create temp table 5 and try to insert 1582 rows into it.
create temp table temp5
as
select
customer_id, emp_id
from
order_detail
limit 0;

copy temp5 from '/tmp/relationError/insert_4_crash.log';

-- Should get an error of type:
-- psql:crash.sql:87: ERROR: could not open relation
1663/16384/16774: No such file or directory
-- CONTEXT: COPY temp5, line 926: "2929 33"
-- This context is always the same.

rollback;

Observations:
1. The size of the data seems to matters. If the amount of data being
inserted is dropped to just one or two records per table, the error
doesn't happen.
2. The order of columns for the select into temp2 matters. Changing
the order can cause the error to go away.
3. If the prepare/commit is replaced with a "commit;" the error goes away.
4. Removing "temp3" or "temp4" from the transaction causes one run of
the above statements to succeed, but if the sequence is issued in the
same PSQL session, the second one will fail.
5. Given the current dataset, the error always occurs on line 926 of
the COPY (even if the values at line 926 are changed).
6. <tablespace>/<database>/<oid> typically always corresponds to that
of temp2 on my system.

Thanks.
- John
[Resending since didn't see this posted on pgsql-hackers]

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2008-02-29 03:12:58 Logging conflicted queries on deadlocks
Previous Message Dawid Kuroczko 2008-02-29 00:46:22 Re: Proposal: wildcards in pg_service.conf