Re: TB-sized databases

From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: TB-sized databases
Date: 2007-11-30 06:41:53
Message-ID: 474FB0B1.6070900@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Simon Riggs wrote:
> On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote:
>
>> Simon Riggs wrote:
>>
>>> All of those responses have cooked up quite a few topics into one. Large
>>> databases might mean text warehouses, XML message stores, relational
>>> archives and fact-based business data warehouses.
>>>
>>> The main thing is that TB-sized databases are performance critical. So
>>> it all depends upon your workload really as to how well PostgreSQL, or
>>> another other RDBMS vendor can handle them.
>>>
>>>
>>> Anyway, my reason for replying to this thread is that I'm planning
>>> changes for PostgreSQL 8.4+ that will make allow us to get bigger and
>>> faster databases. If anybody has specific concerns then I'd like to hear
>>> them so I can consider those things in the planning stages
>>>
>> it would be nice to do something with selects so we can recover a rowset
>> on huge tables using a criteria with indexes without fall running a full
>> scan.
>>
>> In my opinion, by definition, a huge database sooner or later will have
>> tables far bigger than RAM available (same for their indexes). I think
>> the queries need to be solved using indexes enough smart to be fast on disk.
>>
>
> OK, I agree with this one.
>
> I'd thought that index-only plans were only for OLTP, but now I see they
> can also make a big difference with DW queries. So I'm very interested
> in this area now.
>
>
If that's true, then you want to get behind the work Gokulakannan
Somasundaram
(http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has
done with relation to thick indexes. I would have thought that concept
particularly useful in DW. Only having to scan indexes on a number of
join tables would be a huge win for some of these types of queries.

My tiny point of view would say that is a much better investment than
setting up the proposed parameter. I can see the use of the parameter
though. Most of the complaints about indexes having visibility is about
update /delete contention. I would expect in a DW that those things
aren't in the critical path like they are in many other applications.
Especially with partitioning and previous partitions not getting may
updates, I would think there could be great benefit. I would think that
many of Pablo's requests up-thread would get significant performance
benefit from this type of index. But as I mentioned at the start,
that's my tiny point of view and I certainly don't have the resources to
direct what gets looked at for PostgreSQL.

Regards

Russell Smith

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2007-11-30 08:40:19 Re: TB-sized databases
Previous Message Josh Berkus 2007-11-30 05:50:14 Re: Configuring a Large RAM PostgreSQL Server