Disk space is consumed by UPDATE query

From: Patrick Clery <patrick(at)phpforhire(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Disk space is consumed by UPDATE query
Date: 2004-09-28 22:03:45
Message-ID: 200409281603.45267.patrick@phpforhire.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Your name : Patrick Clery
Your email address : patrick(at)phpforhire(dot)com

System Configuration
---------------------
Architecture (example: Intel Pentium) : Intel
Operating System (example: Linux 2.4.18) : FreeBSD 4.10-stable
PostgreSQL version (example: PostgreSQL-8.0): PostgreSQL-8.0
Compiler used (example: gcc 2.95.2) : gcc 2.95.4

Please enter a FULL description of your problem:
------------------------------------------------

This query appears to enter an infinite loop and fill up my /usr partition
(PGDATA=/usr/local/pgsql/data) at a rapid rate:

UPDATE people_locations
SET postalcode = foo.pcode, city_id = foo.cid
FROM (
SELECT
p.postalcode AS pcode,
p.city_id AS cid,
c.state_id AS sid
FROM postalcodes p
JOIN cities c USING (city_id)
) foo
WHERE foo.sid = state_id AND old_postalcode = foo.pcode

psql:/usr/local/www/beano/datingsite/sql/import_people.sql:363: ERROR: could
not write to hash-join temporary file: No space left on device

From when the query is first run (somehow the disk space goes up initially):

$ while : ; do df -h /usr/; sleep 3; done
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.8G 7.8G 43% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.2G 8.3G 39% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.1G 8.4G 38% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.2G 8.4G 38% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.2G 8.3G 39% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.3G 8.3G 39% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.3G 8.3G 39% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.4G 8.2G 40% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.4G 8.2G 40% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.5G 8.1G 40% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.5G 8.1G 41% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.6G 8.0G 41% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.6G 8.0G 41% /usr
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1g 15G 5.7G 7.9G 42% /usr
... and on and on until it reaches zero.

Here's the query plan:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=18770.77..185690.90 rows=20626 width=140)
Hash Cond: ((("outer".postalcode)::text = ("inner".old_postalcode)::text)
AND ("outer".city_id = "inner".city_id))
-> Seq Scan on postalcodes p (cost=0.00..14742.12 rows=825012 width=18)
-> Hash (cost=9955.64..9955.64 rows=366625 width=126)
-> Merge Join (cost=69.83..9955.64 rows=366625 width=126)
Merge Cond: ("outer".state_id = "inner".state_id)
-> Index Scan using cities_state_id on cities c
(cost=0.00..4203.13 rows=73325 width=8)
-> Sort (cost=69.83..72.33 rows=1000 width=122)
Sort Key: people_locations.state_id
-> Seq Scan on people_locations (cost=0.00..20.00
rows=1000 width=122)
(10 rows)

Here's the inner query by itself:

datingsite=> EXPLAIN ANALYZE SELECT
datingsite-> p.postalcode AS pcode,
datingsite-> p.city_id AS cid,
datingsite-> c.state_id AS sid
datingsite-> FROM postalcodes p
datingsite-> JOIN cities c USING (city_id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2091.56..47451.98 rows=825012 width=22) (actual
time=1132.994..16764.241 rows=825012 loops=1)
Hash Cond: ("outer".city_id = "inner".city_id)
-> Seq Scan on postalcodes p (cost=0.00..14742.12 rows=825012 width=18)
(actual time=0.077..4657.842 rows=825012 loops=1)
-> Hash (cost=1585.25..1585.25 rows=73325 width=8) (actual
time=1131.010..1131.010 rows=0 loops=1)
-> Seq Scan on cities c (cost=0.00..1585.25 rows=73325 width=8)
(actual time=0.031..738.582 rows=73325 loops=1)
Total runtime: 20475.610 ms
(6 rows)

Both tables are rather large:

datingsite=> select count(*) from people_locations ;
count
--------
131266
(1 row)

Time: 2566.282 ms
datingsite=> select count(*) from postalcodes;
count
--------
825012
(1 row)

Time: 4246.360 ms

Here is the schema:

datingsite=> \d postalcodes;
Table "public.postalcodes"
Column | Type | Modifiers
------------+-----------------------+-----------
postalcode | character varying(10) | not null
city_id | integer | not null
Indexes:
"postalcodes_pkey" PRIMARY KEY, btree (postalcode, city_id)
Foreign-key constraints:
"postalcodes_city_id_fkey" FOREIGN KEY (city_id) REFERENCES
cities(city_id) ON DELETE CASCADE

datingsite=> \d people_locations;
Table "public.people_locations"
Column | Type | Modifiers
-----------------+------------------------+-------------------------------------------
person_id | integer | not null
city_id | integer | not null default 0
postalcode | character varying(30) | not null default 'N/A'::character
varying
postalcode_city | integer | not null default 0
country_iso | integer | not null default 0
state_id | integer | not null default 0
areacode | integer | not null default 0
old_postalcode | character varying(10) | not null default ''::character
varying
old_cityname | character varying(128) | not null default ''::character
varying
Indexes:
"people_locations_pkey" PRIMARY KEY, btree (person_id)
"people_loc_postalcode" btree (postalcode)
Foreign-key constraints:
"people_locations_person_id_fkey" FOREIGN KEY (person_id) REFERENCES
people(person_id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
"people_locations_city_id_fkey" FOREIGN KEY (city_id) REFERENCES
cities(city_id) ON UPDATE CASCADE ON DELETE RESTRICT
"people_locations_country_iso_fkey" FOREIGN KEY (country_iso) REFERENCES
countries(country_iso)
"people_locations_state_id_fkey" FOREIGN KEY (state_id) REFERENCES
states(state_id)

Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
I can repeat the problem each time I run that query. If there is any further
debugging info needed, I'm willing to provide it.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Matthew Hixson 2004-09-28 23:32:03 Postgres 8.0b3 doesn't log queries correctly
Previous Message Thomas LeBlanc 2004-09-28 16:48:19 ODBC Driver and ADO...