Re: using a lot of maintenance_work_mem

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Frederik Ramm <frederik(at)remote(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: using a lot of maintenance_work_mem
Date: 2011-04-09 01:54:06
Message-ID: 20110409015405.GH4548@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, all,

Having run into issues caused by small work_mem, again, I felt the need
to respond to this.

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> You would break countless things. It might be okay anyway in a trusted
> environment, ie, one without users trying to crash the system, but there
> are a lot of security-critical implications of that test.

I really don't see work_mem or maintenance_work_mem as security-related
parameters. Amusingly, the Postgres95 1.01 release apparently attmpted
to make the cap 16GB (but failed and made it 256M instead). After a bit
of poking around, I found this commit:

commit 85c17dbff8ade0c5237e3ac1ece7cacacfdde399
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Tue Feb 6 01:53:53 2001 +0000

Out-of-bounds memory allocation request sizes should be treated as just
elog(ERROR) not an Assert trap, since we've downgraded out-of-memory to
elog(ERROR) not a fatal error. Also, change the hard boundary from 256Mb
to 1Gb, just so that anyone who's actually got that much memory to spare
can play with TOAST objects approaching a gigabyte.

If we want to implement a system to limit what users can request with
regard to work_mem then we can do that, but a smart user could probably
circumvent such a system by building huge queries.. A system which
monitered actual usage and ERROR'd out would probably be better to
address that concern.

> If we were actually trying to support such large allocations,
> what I'd be inclined to do is introduce a separate call along the lines
> of MemoryContextAllocLarge() that lacks the safety check.

This sounds like the right approach to me. Basically, I'd like to have
MemoryContextAllocLarge(), on 64bit platforms, and have it be used for
things like sorts and hash tables. We'd need to distinguish that usage
from things which allocate varlena's and the like.

> But before
> expending time on that, I'd want to see some evidence that it's actually
> helpful for production situations. I'm a bit dubious that you're going
> to gain much here.

I waited ~26hrs for a rather simple query:

explain select
<bunch-of-columns>,
<bunch-of-aggregates>
from really_big_table
where customer_code ~ '^CUST123'
group by
<bunch-of-columns>
;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=37658456.68..42800117.89 rows=10546998 width=146)
-> Sort (cost=37658456.68..37922131.61 rows=105469973 width=146)
Sort Key: <bunch-of-columns>
-> Seq Scan on really_big_table (cost=0.00..15672543.00 rows=105469973 width=146)
Filter: ((customer_code)::text ~ '^CUST123'::text)
(5 rows)

This query ran for ~26 hours, where ~20 hours was spent sorting the ~30G
which resulted from the Seq-Scan+filter (the raw table is ~101G). The
resulting table (after the GroupAgg) was only 30MB in size (~80k rows
instead of the estimated 10M above). Another query against the same
101G table, which used a HashAgg, completed just a bit faster than the
26 hours:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=19627666.99..19631059.80 rows=90475 width=116) (actual time=1435604.737..1435618.293 rows=4869 loops=1)
-> Seq Scan on really_big_table (cost=0.00..15672543.00 rows=105469973 width=116) (actual time=221029.805..804802.329 rows=104616597 loops=1)
Filter: ((agency_hierarchy_code)::text ~ '^CUST123'::text)
Total runtime: 1435625.388 ms
(4 rows)

Now, this query had fewer columns in the group by (required to convince
PG to use a HashAgg), but, seriously, it only took 23 minutes to scan
through the entire table. It could have taken 3 hours and I would have
been happy.

Admittedly, part of the problem here is the whole cross-column
correllation stats problem, but I wouldn't care if the stats were right
and I ended up with a 1.5G hash table and 10M records result, I'm pretty
sure generating that would be a lot faster using a HashAgg than a
sort+GroupAgg. Also, I feel like we're pretty far from having the
cross-column statistics fixed and I'm not 100% convinced that it'd
actually come up with a decent result for this query anyway (there's 18
columns in the group by clause for the first query...).

Anyhow, I just wanted to show that there are definitely cases where the
current limit is making things difficult for real-world PG users on
production systems.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2011-04-09 01:57:08 Re: pg_upgrade bug found!
Previous Message A.M. 2011-04-09 01:28:37 Re: lowering privs in SECURITY DEFINER function