Re: Maximum statistics target

From: Cédric Villemain <cedric(dot)villemain(at)dalibo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: Maximum statistics target
Date: 2008-03-10 10:58:37
Message-ID: 200803101158.42022.cedric.villemain@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le Monday 10 March 2008, Peter Eisentraut a écrit :
> Am Freitag, 7. März 2008 schrieb Tom Lane:
> > I'm not wedded to the number 1000 in particular --- obviously that's
> > just a round number. But it would be good to see some performance tests
> > with larger settings before deciding that we don't need a limit.
>
> Well, I'm not saying we should raise the default statistics target. But
> setting an arbitrary limit on the grounds that larger values might slow the
> system is like limiting the size of tables because larger tables will cause
> slower queries. Users should have the option of finding out the best
> balance for themselves. If there are concerns with larger statistics
> targets, we should document them. I find nothing about this in the
> documentation at the moment.

I find 2 things:
«Increasing the target causes a proportional increase in the time and space
needed to do ANALYZE. »
in http://www.postgresql.org/docs/current/static/sql-analyze.html
and
« ... at the price of consuming more space in pg_statistic and slightly more
time to compute the estimates»
in http://www.postgresql.org/docs/current/static/planner-stats.html

But probably not clear enought about time impact in query plan.

>
> > IIRC, egjoinsel is one of the weak spots, so tests involving planning of
> > joins between two tables with large MCV lists would be a good place to
> > start.
>
> I have run tests with joining two and three tables with 10 million rows
> each, and the planning times seem to be virtually unaffected by the
> statistics target, for values between 10 and 800000. They all look more or
> less like this:
>
> test=# explain select * from test1, test2 where test1.a = test2.b;
> QUERY PLAN
> ---------------------------------------------------------------------------
>-- Hash Join (cost=308311.00..819748.00 rows=10000000 width=16)
> Hash Cond: (test1.a = test2.b)
> -> Seq Scan on test1 (cost=0.00..144248.00 rows=10000000 width=8)
> -> Hash (cost=144248.00..144248.00 rows=10000000 width=8)
> -> Seq Scan on test2 (cost=0.00..144248.00 rows=10000000
> width=8) (5 rows)
>
> Time: 132,350 ms
>
> and with indexes
>
> test=# explain select * from test1, test2 where test1.a = test2.b;
> QUERY PLAN
> ---------------------------------------------------------------------------
>----------------- Merge Join (cost=210416.65..714072.26 rows=10000000
> width=16)
> Merge Cond: (test1.a = test2.b)
> -> Index Scan using test1_index1 on test1 (cost=0.00..282036.13
> rows=10000000 width=8)
> -> Index Scan using test2_index1 on test2 (cost=0.00..282036.13
> rows=10000000 width=8)
> (4 rows)
>
> Time: 168,455 ms
>
> The time to analyze is also quite constant, just before you run out of
> memory. :) The MaxAllocSize is the limiting factor in all this. In my
> example, statistics targets larger than about 800000 created pg_statistic
> rows that would have been larger than 1GB, so they couldn't be stored.
>
> I suggest that we get rid of the limit of 1000, adequately document
> whatever issues might exist with large values (possibly not many, see
> above), and add an error message more user-friendly than "invalid memory
> alloc request size" for the cases where the value is too large to be
> storable.

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-03-10 12:24:08 Re: Include Lists for Text Search
Previous Message Simon Riggs 2008-03-10 10:43:09 Re: Include Lists for Text Search