Re: On Scalability

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Vincenzo Romano <vincenzo(dot)romano(at)notorand(dot)it>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Hackers and Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On Scalability
Date: 2010-07-30 20:39:13
Message-ID: AANLkTinL2Y7-HdfmQzKF-+XxwFV=c4OUVRhJY1KNuNQs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Jul 30, 2010 at 3:50 PM, Vincenzo Romano
<vincenzo(dot)romano(at)notorand(dot)it> wrote:
> 2010/7/30 Josh Berkus <josh(at)agliodbs(dot)com>:
>>
>>> Is there anyone who knows whether those algorithms are linear or not?
>>
>> Read the code?  It's really very accessible, and there's lots and lots
>> of comments.  While the person who wrote the code is around, isn't it
>> better to see the real implementation?
>
> If the programmer(s) who wrote that part is around, a simple hint would suffice.
> Even an hint to where look into the code would be very appreciated: the query
> planner is not as simple as the "ls" command (which is not that simple any
> more, though).
>
> It looks like I need to go the hard way ...
> Starting from postgresql-8.4.4/src/backend/optimizer

I think you're approaching this in the wrong way. You've repeatedly
said you don't want to do all the work of setting up a test, but
trying to search the code for algorithms that might not be linear is
not going to be easier. I've been reading this thread and I'm fairly
familiar with this code, and I even understand the algorithms pretty
well, and I don't know whether they're going to be linear for what you
want to do or not. Certainly, the overall task of join planning is
exponential-time in the number of *tables*, but if you're just doing
SELECT statements on a single table, will that be linear? Tough to
say. Certainly, there are a lot of linked lists in there, so if we
have any place where we have two nested loops over the list of
indices, it won't be linear. I can't think of a place where we do
that, but that doesn't mean there isn't one. And even if they are
linear, or n log n or something, the coefficients could still be
lousy. Theoretical computer science is one of my favorite subjects,
but, it doesn't always tell you what you want to know about the real
world.

It doesn't seem like it should be very hard to figure this out
empirically. Just create a big database full of random data. Maybe
you could populate one of the columns with something like (random() *
1000)::int. Then you could create partial indices ON
(some_other_column) WHERE that_column = <blat> for <blat> in 0..999.
Then you could run some test queries and see how you make out.

Or, I mean, you can read the source code. That's fine, too. It's
just... I've already read the source code quite a few times, and I
still don't know the answer. Admittedly, I wasn't trying to answer
this specific question, but still - I don't think it's an easy
question to answer that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-30 21:58:03 reducing NUMERIC size for 9.1, take two
Previous Message Yeb Havinga 2010-07-30 20:38:30 Re: patch for check constraints using multiple inheritance

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-07-31 02:21:20 Re: Explains of queries to partitioned tables
Previous Message Greg Smith 2010-07-30 20:38:17 Re: On Scalability