Re: [HACKERS] longer-term optimizer musings

From: Erik Riedel <riedel+(at)CMU(dot)EDU>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] longer-term optimizer musings
Date: 1999-03-24 05:13:13
Message-ID: sqy7Fd600anI01i2E0@andrew.cmu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> Interesting. The problem I see is that trying to do a char(20) column
> with min(A) and max(B) can have 256^19 possible unique values from A to
> B, so it kind if kills many general cases. Floats have the same
> problem.
>
Right, in most general cases, there isn't much you can do.

Although, if this seemed like an important thing, one could imagine an
extension to 'vacuum analyze' and pg_statistic that tried to track the
number of unique values while it finds the min and max. Maybe tracking
some fixed number (10?) of unique attr values and stop searching once it
exceeds 10 different values (or maybe some tiny fraction of the tuples
in the relation, whatever gives a decent balance of memory and CPU at
analyze time). Basically to find out if it might be .01% instead of the
10% default you suggest below.

This would work for a database that tracks all the CDs owned by "Bob"
and "Alice" even with char(20) first names. For floats, it wouldn't be
very good for prices at Tiffany's, but should work pretty well for the
Everything's $1 store.

> A nice general fix would be to assume GROUP BY/AGG returns only 10% of
> the existing rows. I don't even know if an Aggregate without a group by
> knows it only returns one row. Oops, I guess not:
>
> test=> explain select max(relpages) from pg_class;
> NOTICE: QUERY PLAN:
>
> Aggregate (cost=2.58 size=0 width=0)
> -> Seq Scan on pg_class (cost=2.58 size=48 width=4)
>
Yup, this would be easy to add (both the 10% and 1 for non-group aggs).
The current code just passes along the cost and zeros the size and width
in all Sort, Group, and Aggregate nodes (this was the issue flagged as
Problem 1 in my message - and I tried to give line numbers where that
would have to be fixed). Note that cost_sort() seems to work reasonably
enough, but has this non-obvious "sortkeys" argument that it does
nothing with.

> So I guess I am saying that your ideas are good, but we need to walk
> before we can run with this optimizer.
>
Understood. I am not discouraged and will continue throwing these
things out as I see them and think I have a reasonable explanation.

Erik

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-03-24 06:11:56 Re: [HACKERS] aggregation memory leak and fix
Previous Message The Hermit Hacker 1999-03-24 05:08:55 Re: [HACKERS] PostgreSQL LOGO (was: Developers Globe (FINAL))