Re: "Big O" notation for postgres?

From: "H(dot) Hall" <hhall1001(at)reedyriver(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Cc: Richard Huxton <dev(at)archonet(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Subject: Re: "Big O" notation for postgres?
Date: 2008-05-26 15:00:18
Message-ID: 483AD082.8020104@reedyriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Gregory Stark wrote:
> "Richard Huxton" <dev(at)archonet(dot)com> writes:
>
>
>> Jonah H. Harris wrote:
>>
>>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:
>>>
>>>> Does anyone know if there is a source that provides "Big O" notation for
>>>> postgres's aggregate functions and operations? For example is count(*) =
>>>> O(1) or O(n)?
>>>>
>>> I don't know of any document containing the complexity of each
>>> aggregate, but it's sometimes left as a comment in the souce code.
>>>
>> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
>> presence of an index too, just to muddy the waters.
>>
When I first read the above from Jonah I just assumed some Postgres
magic was producing O(1). After seeing this again, I believe that
Postgres must be doing something like the following for max and min:
Max: ORDER BY colName DESC LIMIT 1
Min: ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index
as in the above then the Max and Min functions would both be O(log N) ,
this is log base2 of N, which is the time it takes to search a balanced
binary tree, not O(1) which implies a constant time to perform,
regardless of the size of the dataset N.

>
> Hm, true. But excluding those special cases all Postgres aggregate functions
> will be O(n) unless they're doing something very odd. None of the built-in
> functions (except min/max as noted) do anything odd like that.
>
> The reason way is because of the basic design of Postgres aggregate functions.
> They are fed every tuple one by one and have to keep their state in a single
> variable. Most of the aggregate functions like count(*) etc just keep a static
> non-growing state and the state transition function is a simple arithmetic
> operation which is O(1). So the resulting operation is O(n).
>
> Actually one exception would be something like
>
> CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');
>
> Since the state variable has to keep accumulating more and more stuff the
> array_append becomes more and more expensive (it has to generate a new array
> so it has to copy the existing stuff). So actually it woul dbe O(n^2).
>
> The only builtin aggregate which looks like it falls in this category would be
> xmlagg()
>
>
Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its
estimates. This would allow one to quickly tell how a query scales with
a large number of rows.

Thanks,
HH

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-05-26 15:08:51 Re: Symbolic Links to Tablespaces
Previous Message Campbell, Lance 2008-05-26 14:11:51 Symbolic Links to Tablespaces