Skip site navigation (1) Skip section navigation (2)

Re: "Big O" notation for postgres?

From: "H(dot) Hall" <hhall1001(at)reedyriver(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Cc: Richard Huxton <dev(at)archonet(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: "Big O" notation for postgres?
Date: 2008-05-26 15:00:18
Message-ID: 483AD082.8020104@reedyriver.com (view raw or flat)
Thread:
Lists: pgsql-performance
Gregory Stark wrote:
> "Richard Huxton" <dev(at)archonet(dot)com> writes:
>
>   
>> Jonah H. Harris wrote:
>>     
>>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:
>>>       
>>>> Does anyone know if there is a source that provides "Big O" notation for
>>>> postgres's aggregate functions and operations?  For example is count(*) =
>>>> O(1) or O(n)?
>>>>         
>>> I don't know of any document containing the complexity of each
>>> aggregate, but it's sometimes left as a comment in the souce code.
>>>       
>> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
>> presence of an index too, just to muddy the waters.
>>     
When I first read the above from Jonah I just assumed some Postgres 
magic was producing O(1). After seeing this again, I believe that 
Postgres must be doing something like the following for max and min:
Max:     ORDER BY colName  DESC LIMIT 1
Min:      ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index 
as in the above then the Max and Min functions would both be O(log N) , 
this is  log base2 of N, which is the time it takes to search a balanced 
binary tree, not O(1) which implies a constant time to perform, 
regardless of the size of the dataset N.

>
> Hm, true. But excluding those special cases all Postgres aggregate functions
> will be O(n) unless they're doing something very odd. None of the built-in
> functions (except min/max as noted) do anything odd like that.
>
> The reason way is because of the basic design of Postgres aggregate functions.
> They are fed every tuple one by one and have to keep their state in a single
> variable. Most of the aggregate functions like count(*) etc just keep a static
> non-growing state and the state transition function is a simple arithmetic
> operation which is O(1). So the resulting operation is O(n).
>
> Actually one exception would be something like
>
> CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');
>
> Since the state variable has to keep accumulating more and more stuff the
> array_append becomes more and more expensive (it has to generate a new array
> so it has to copy the existing stuff). So actually it woul dbe O(n^2).
>
> The only builtin aggregate which looks like it falls in this category would be
> xmlagg()
>
>   
Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its 
estimates.  This would allow one to quickly tell how a query scales with 
a large number of rows.

Thanks,
HH

-- 
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com


In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2008-05-26 15:08:51
Subject: Re: Symbolic Links to Tablespaces
Previous:From: Campbell, LanceDate: 2008-05-26 14:11:51
Subject: Symbolic Links to Tablespaces

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group