Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

From: Benjamin Arai <me(at)benjaminarai(dot)com>
To: Joshua D(dot)Drake <jd(at)commandprompt(dot)com>
Cc: Brandon Shalton <brandon(at)cydataservices(dot)com>, PostgreSQL <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Date: 2007-08-25 05:34:06
Message-ID: F6B156AD-BE6D-4300-871A-E3DE0491AA11@benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This kind of disappointing, I was hoping there was more that could be
done.

There has to be another way to do incremental indexing without
loosing that much performance.

Benjamin

On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brandon Shalton wrote:
>> Benjamin,
>>
>>
>>>
>>> In order to avoid the re-indexing I was thinking of instead creating
>>> a new
>>> table each month (building its indexes and etc) and accessing
>>> them all
>>> through a view. This way I only have to index the new data each
>>> month.
>>>
>>
>> Take a look at bizgres.org (based on postgres).
>>
>> They have a parent-child structure.
>>
>> The way i use it, is I have about 30M records a day that are inserted
>> into the database.
>>
>> Each day is a "child" table to the "parent".
>>
>> so example:
>>
>> the parent table is called "logfile"
>>
>> each day, is a child, with the structure like "logfile_YYMMDD"
>>
>> the "child" inherits the table structure of the parent, such that you
>> could query the child table name directly, or you run the query
>> against
>> the parent (ie. logfile table) and get all the data.
>>
>> the indexes are done on a per table basis, so new data that comes
>> in, is
>> a lesser amount, and doesn't require re-indexing.
>
>
> PostgreSQL can do all of this too.
>
> Sincerely,
>
> Joshua D. Drake
>
>>
>>
>> example:
>>
>> select * from logfile_070825 where datafield = 'foo'
>>
>> if i knew i wanted to specifically go into that child, or:
>>
>> select * from logfile where datafield = 'foo'
>>
>> and all child tables are searched and results merged. You can
>> perform
>> any kind of sql query and field structures are you normally do.
>>
>> the downside is that the queries are run sequentially.
>>
>> so if you had 100 child tables, each table is queried via indexes,
>> then
>> results are merged.
>>
>> but, this approach does allow me to dump alot of data in, without
>> having
>> the re-indexing issues you are facing.
>>
>> at some point, you could roll up the days, in to weekly child tables,
>> then monthly tables, etc.
>>
>> I believe Bizgres has a new version of their system that does
>> parallel
>> queries which would certainly speed things up.
>>
>> For your documents, you can do it by the day it was checked in, or
>> maybe
>> you have some other way of logically grouping, but the parent/child
>> table structure really helped to solve my problem of adding in
>> millions
>> of records each day.
>>
>> The closest thing in mysql is using merge tables, which is not really
>> practical when it comes time to do the joins to the tables.
>>
>> -brandon
>>
>> http://www.t3report.com - marketing intelligence for online marketing
>> and affiliate programs
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
>
> - --
>
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
> PostgreSQL solutions since 1997 http://www.commandprompt.com/
> UNIQUE NOT NULL
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
> donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGz4XuATb/zqfZUUQRAukhAJ9b2x4PLPZsoPmtm3O/Ze4AobDXngCgq+rl
> X2j2ePDyjYxRajfGCVmjnYU=
> =pIjb
> -----END PGP SIGNATURE-----
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQIVAwUBRs+/UfyqRf6YpodNAQL6Xg//eEqR0UQ4I/snn7Dtmkru40jCuECGeG8g
XoxLWEa+bumVgwrEYbjKTBp3KP6OEKz9VV4xHQROTtqxh+rg0hdoc0kWxSyquCm8
VljL24ykvBmRmjhacwi8FKp092zwRcLrbkzTxIr90q8u008aVPWxQCBtmfL6QVTv
I9AyN0kb00ypx+B9I2ySugYzBerVCMUiKUeXplHWn1loSSm1w+5CzXY8gtvivFEV
YspS1Fk2rxjnjlPE/FTGUiwJrdWZTJrd3BuSVbH5DWBoCjz9gzq0NyNZAtESWX2H
oGwlWBEJNFTtoHnK4iTMS+CzKHQQQZ9ZuQcHy84SlXYUo9n0/NCIeabu2xaj44Fs
LFq8jBCH3ebAkD/hQOgk1H05ljbfX8A/u2zz75W1NbD0xTB/sAljWqhypz2x7pOo
sUJF9MQ7DwVG8JitUAAc5fuGpLLR4WxF68YdkgycaCNknP7IATeD2ecqJkC26Av+
GHHci2ct5ypVq9Qq8OuesYSox7XpO2+E+Y5DtgBo+/R7eOJRLA3Z0FDXFLGsdFxy
0OKoew1MN79jP+KMZFJwvddH/TrkZBdIKlkacXYwUHU3c1ATwne6WteKTnEmr2aP
99oQgfmNDyQgTeEL20jokF4YZOdm1UO3Cc7wTi2QlwyqUDbUmYtWzgbS9QbnaGGA
58XdVacGznw=
=Hst4
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jaime Casanova 2007-08-25 05:40:44 Re: Add Column BEFORE/AFTER another column
Previous Message Erik Jones 2007-08-25 04:38:26 Re: Add Column BEFORE/AFTER another column

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Kempter 2007-08-25 05:34:23 Re: significant vacuum issues - looking for suggestions
Previous Message Brandon Shalton 2007-08-25 03:54:24 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views