Re: splitting data into multiple tables

From: Viji V Nair <viji(at)fedoraproject(dot)org>
To: nair rajiv <nair331(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-performance(at)postgresql(dot)org
Subject: Re: splitting data into multiple tables
Date: 2010-01-26 07:58:02
Message-ID: 84c89ac11001252358k498053cdpa0c05f6e5ece3303@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jan 26, 2010 at 9:18 AM, nair rajiv <nair331(at)gmail(dot)com> wrote:

>
>
> On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
>> > On Tue, Jan 26, 2010 at 1:01 AM, Craig James
>> <craig_james(at)emolecules(dot)com>wrote:
>> > I am working on a project that will take out structured
>> content
>> > from wikipedia
>> > and put it in our database. Before putting the data into the database I
>> > wrote a script to
>> > find out the number of rows every table would be having after the data
>> is
>> > in and I found
>> > there is a table which will approximately have 50,000,000 rows after
>> data
>> > harvesting.
>> > Is it advisable to keep so much data in one table ?
>> Depends on your access patterns. I.e. how many rows are you accessing at
>> the
>> same time - do those have some common locality and such.
>>
>
> I'll give a brief idea of how this table is. The important columns
> are
> subject, predicate and object. So given a predicate and object one should
> be able to get all the subjects, given subject and a predicate one should
> be able to retrieve all the objects. I have created an indexes on these
> three
> columns.
>
>>
>>
>> > I have read about 'partitioning' a table. An other idea I have
>> is
>> > to break the table into
>> > different tables after the no of rows in a table has reached a certain
>> > limit say 10,00,000.
>> > For example, dividing a table 'datatable' to 'datatable_a',
>> 'datatable_b'
>> > each having 10,00,000 rows.
>> > I needed advice on whether I should go for partitioning or the approach
>> I
>> > have thought of.
>> Your approach is pretty close to partitioning - except that partitioning
>> makes
>> that mostly invisible to the outside so it is imho preferrable.
>>
>> > We have a HP server with 32GB ram,16 processors. The storage
>> has
>> > 24TB diskspace (1TB/HD).
>> > We have put them on RAID-5. It will be great if we could know the
>> > parameters that can be changed in the
>> > postgres configuration file so that the database makes maximum
>> utilization
>> > of the server we have.
>> > For eg parameters that would increase the speed of inserts and selects.
>> Not using RAID-5 possibly would be a good start - many people (me
>> included)
>> experienced bad write performance on it. It depends a great deal on the
>> controller/implementation though.
>> RAID-10 is normally to be considered more advantageous despite its lower
>> usable space ratio.
>> Did you create one big RAID-5 out of all disks? Thats not a good idea,
>> because
>> its pretty likely that another disk fails while you restore a previously
>> failed disk. Unfortunately in that configuration that means you have lost
>> your
>> complete data (in the most common implementations at least).
>>
>
> No, I am using only 12TB i.e 12 HDs of the 24TB I have
>

A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now
you can calculate the number of disks, specifically spindles, for getting
your desired throughput and IOPs

>
>> Andres
>>
>> PS: Your lines are strangely wrapped...
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2010-01-26 11:41:56 Re: splitting data into multiple tables
Previous Message nair rajiv 2010-01-26 03:48:54 Re: splitting data into multiple tables