Re: Pet Peeves?

From: Chris Mayfield <cmayfiel(at)cs(dot)purdue(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Pet Peeves?
Date: 2009-02-03 20:03:43
Message-ID: gma7uv$2gel$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's a few more pet peeves. I'm not sure if any of these are known
bugs or just me being picky.

--Chris

--------------------------------------------------

1. Having to rewrite entire tables out to disk the first time I scan
them, for example:

CREATE TABLE t1 AS ...; -- writes 100 GB to disk
CREATE INDEX i1 ON t1 ...; -- rewrites 100 GB to disk

The main issue is setting the hint bits for each tuple, which IMO should
initially be set for "CREATE TABLE AS" statements. To work around this
for now, I modified heap_insert (in heapam.c) to mark tuples as
committed when inserting them into newly added pages without WAL:

/*
* Optimization for CREATE TABLE AS statements: mark tuples as committed
* to prevent rewriting them to disk upon first use. This is safe since
* the new relation is not visible until the transaction commits anyway.
*/
if (!use_wal && !use_fsm)
{
tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
}

--------------------------------------------------

2. Having to use a subquery and/or "OFFSET 0" to prevent multiple calls
to immutable functions returning composite types, for example:

CREATE TYPE three AS
(i integer, j integer, k integer);

CREATE FUNCTION hello()
RETURNS three AS $$
DECLARE ret three;
BEGIN
RAISE NOTICE 'hello';
ret := ROW(1,2,3);
RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- function called three times
SELECT (hello()).*;

-- function called only once
SELECT (h).* FROM (
SELECT hello() AS h
) AS sub;

-- function called three times
SELECT (h).* FROM (
SELECT hello() AS h
FROM generate_series(1,3)
) AS sub;

-- function called only once
SELECT (h).* FROM (
SELECT hello() AS h
FROM generate_series(1,3)
OFFSET 0
) AS sub;

--------------------------------------------------

3. Poor estimates for n_distinct can result in out of memory errors.

For example, I have a user-defined aggregate over a user-defined data
type (both written in C). The aggregate doesn't take much memory, but
the data type can be relatively large (i.e. "STORAGE = extended"). My
table has five million distinct rows, but n_distinct is around 50,000
(increasing the stats target helps a little, but it's still way off).
As a result the planner choses "SeqScan + HashAgg" instead of "IndexScan
+ GroupAgg", and the query aborts when the hash table eventually runs
out of memory.

I currently work around this issue using "SET enable_hashagg TO off;"
when necessary.

In response to

  • Pet Peeves? at 2009-01-29 13:16:17 from Gregory Stark

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Geoghegan 2009-02-03 20:41:22 Re: Pet Peeves?
Previous Message Grzegorz Jaśkiewicz 2009-02-03 19:09:59 C function question