Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

From: "Brandon Shalton" <brandon(at)cydataservices(dot)com>
To: <me(at)benjaminarai(dot)com>,"PostgreSQL" <pgsql-general(at)postgresql(dot)org>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Date: 2007-08-25 03:54:24
Message-ID: 0be901c7e6cb$a2b13320$c901a8c0@blaptop (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-performance
Benjamin,


>
> In order to avoid the re-indexing I was thinking of instead creating  a 
> new
> table each month (building its indexes and etc) and accessing them all
> through a view. This way I only have to index the new data each month.
>

Take a look at bizgres.org (based on postgres).

They have a parent-child structure.

The way i use it, is I have about 30M records a day that are inserted into 
the database.

Each day is a "child" table to the "parent".

so example:

the parent table is called  "logfile"

each day, is a child, with the structure like  "logfile_YYMMDD"

the "child" inherits the table structure of the parent, such that you could 
query the child table name directly, or you run the query against the parent 
(ie. logfile table) and get all the data.

the indexes are done on a per table basis, so new data that comes in, is a 
lesser amount, and doesn't require re-indexing.


example:

select * from logfile_070825 where datafield = 'foo'

if i knew i wanted to specifically go into that child, or:

select * from logfile where datafield = 'foo'

and all child tables are searched and results merged.  You can perform any 
kind of sql query and field structures are you normally do.

the downside is that the queries are run sequentially.

so if you had 100 child tables, each table is queried via indexes, then 
results are merged.

but, this approach does allow me to dump alot of data in, without having the 
re-indexing issues you are facing.

at some point, you could roll up the days, in to weekly child tables, then 
monthly tables, etc.

I believe Bizgres has a new version of their system that does parallel 
queries which would certainly speed things up.

For your documents, you can do it by the day it was checked in, or maybe you 
have some other way of logically grouping, but the parent/child table 
structure really helped to solve my problem of adding in millions of records 
each day.

The closest thing in mysql is using merge tables, which is not really 
practical when it comes time to do the joins to the tables.

-brandon

http://www.t3report.com - marketing intelligence for online marketing and 
affiliate programs










In response to

Responses

pgsql-performance by date

Next:From: Benjamin AraiDate: 2007-08-25 05:34:06
Subject: Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Previous:From: Joshua D. DrakeDate: 2007-08-25 01:29:18
Subject: Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

pgsql-general by date

Next:From: Erik JonesDate: 2007-08-25 04:38:26
Subject: Re: Add Column BEFORE/AFTER another column
Previous:From: StuartDate: 2007-08-25 03:27:02
Subject: Re: connect by service name in psql

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group