Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

From: Benjamin Arai <me(at)benjaminarai(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Brandon Shalton <brandon(at)cydataservices(dot)com>, PostgreSQL <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Date: 2007-08-25 19:49:13
Message-ID: 22B8D5DC-456F-4337-A1D9-B2C9125CA71E@benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since I am using tsearch2 on the table I think there is going to be a
significant performance hit - e.g., I partition by batch (batches are
not separated by date, they are essentially random subsets of a much
larger data-set). I am querying this database using tsearch2, so I
am assuming all tables are going to be queried each time since the
text is not partition by any specific constraint - e.g., >R goes to
table 1 and <=R goes to table 2.

Benjamin

On Aug 25, 2007, at 11:18 AM, Joshua D. Drake wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Benjamin Arai wrote:
>> As stated in the previous email if I use partitioning then queries
>> will
>> be executed sequentially - i.e., instead of log(n) it would be (#
>> partitions) * log(n). Right?
>
> The planner will consider every relevant partition during the
> execution.
> Which may be a performance hit, it may not be. It depends on many
> factors. In general however, partitioning when done correctly is a
> performance benefit and a maintenance benefit.
>
> Sincerely,
>
> Joshua D. Drake
>
>
>>
>> Benjamin
>>
>> On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>
>>> Benjamin Arai wrote:
>>>> This kind of disappointing, I was hoping there was more that
>>>> could be
>>>> done.
>>>>
>>>> There has to be another way to do incremental indexing without
>>>> loosing
>>>> that much performance.
>>
>>> What makes you think you are loosing performance by using
>>> partitioning?
>>
>>> Joshua D. Drake
>>
>>>>
>>>> Benjamin
>>>>
>>>> On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:
>>>>
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>
>>>>> Brandon Shalton wrote:
>>>>>> Benjamin,
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> In order to avoid the re-indexing I was thinking of instead
>>>>>>> creating
>>>>>>> a new
>>>>>>> table each month (building its indexes and etc) and accessing
>>>>>>> them
>>>>>>> all
>>>>>>> through a view. This way I only have to index the new data each
>>>>>>> month.
>>>>>>>
>>>>>>
>>>>>> Take a look at bizgres.org (based on postgres).
>>>>>>
>>>>>> They have a parent-child structure.
>>>>>>
>>>>>> The way i use it, is I have about 30M records a day that are
>>>>>> inserted
>>>>>> into the database.
>>>>>>
>>>>>> Each day is a "child" table to the "parent".
>>>>>>
>>>>>> so example:
>>>>>>
>>>>>> the parent table is called "logfile"
>>>>>>
>>>>>> each day, is a child, with the structure like "logfile_YYMMDD"
>>>>>>
>>>>>> the "child" inherits the table structure of the parent, such
>>>>>> that you
>>>>>> could query the child table name directly, or you run the query
>>>>>> against
>>>>>> the parent (ie. logfile table) and get all the data.
>>>>>>
>>>>>> the indexes are done on a per table basis, so new data that comes
>>>>>> in, is
>>>>>> a lesser amount, and doesn't require re-indexing.
>>>>
>>>>
>>>>> PostgreSQL can do all of this too.
>>>>
>>>>> Sincerely,
>>>>
>>>>> Joshua D. Drake
>>>>
>>>>>>
>>>>>>
>>>>>> example:
>>>>>>
>>>>>> select * from logfile_070825 where datafield = 'foo'
>>>>>>
>>>>>> if i knew i wanted to specifically go into that child, or:
>>>>>>
>>>>>> select * from logfile where datafield = 'foo'
>>>>>>
>>>>>> and all child tables are searched and results merged. You can
>>>>>> perform
>>>>>> any kind of sql query and field structures are you normally do.
>>>>>>
>>>>>> the downside is that the queries are run sequentially.
>>>>>>
>>>>>> so if you had 100 child tables, each table is queried via
>>>>>> indexes,
>>>>>> then
>>>>>> results are merged.
>>>>>>
>>>>>> but, this approach does allow me to dump alot of data in, without
>>>>>> having
>>>>>> the re-indexing issues you are facing.
>>>>>>
>>>>>> at some point, you could roll up the days, in to weekly child
>>>>>> tables,
>>>>>> then monthly tables, etc.
>>>>>>
>>>>>> I believe Bizgres has a new version of their system that does
>>>>>> parallel
>>>>>> queries which would certainly speed things up.
>>>>>>
>>>>>> For your documents, you can do it by the day it was checked
>>>>>> in, or
>>>>>> maybe
>>>>>> you have some other way of logically grouping, but the parent/
>>>>>> child
>>>>>> table structure really helped to solve my problem of adding in
>>>>>> millions
>>>>>> of records each day.
>>>>>>
>>>>>> The closest thing in mysql is using merge tables, which is not
>>>>>> really
>>>>>> practical when it comes time to do the joins to the tables.
>>>>>>
>>>>>> -brandon
>>>>>>
>>>>>> http://www.t3report.com - marketing intelligence for online
>>>>>> marketing
>>>>>> and affiliate programs
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------(end of
>>>>>> broadcast)---------------------------
>>>>>> TIP 6: explain analyze is your friend
>>>>>>
>>>>
>>>>
>>>>> - --
>>>>
>>>>> === The PostgreSQL Company: Command Prompt, Inc. ===
>>>>> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
>>>>> PostgreSQL solutions since 1997 http://www.commandprompt.com/
>>>>> UNIQUE NOT NULL
>>>>> Donate to the PostgreSQL Project:
>>>>> http://www.postgresql.org/about/donate
>>>>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>>>
>>>>>
>>
>>
>>> - ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>
>>
>>
>>> - --
>>
>>> === The PostgreSQL Company: Command Prompt, Inc. ===
>>> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
>>> PostgreSQL solutions since 1997 http://www.commandprompt.com/
>>> UNIQUE NOT NULL
>>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
>>> donate
>>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>
>>>
>
>
> - --
>
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
> PostgreSQL solutions since 1997 http://www.commandprompt.com/
> UNIQUE NOT NULL
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
> donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFG0HJ5ATb/zqfZUUQRAuEdAJwNwsr/XCsr85tElSVbRVMUHME+PACglbJK
> gj5cZgOtgEEjUPph0jpsOcw=
> =u7Ox
> -----END PGP SIGNATURE-----
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQIVAwUBRtCHvPyqRf6YpodNAQLDuBAAp+dg1MHZy+hjZ8Zk9OQTbeADgJqGPpi9
G7+y3iyaaqOF66TC52P7OaqO6nPhoNCNMCxwztnASyxpftGD5yJ4AZTSGcbAsWB9
cO5mE1mgbngZNPnRLypeJ81hyE6bniNNL7xXSq9LB8wRMczFZwGVZT66+lMIFjvv
0OrbAcSNUFqddky7EFm8gx6A2FNIzSdFB0dNbzpKwEOTnCHKvh+O99sAr/LB7mmL
Hj/wzeQKrWbDAB3+N9rczivZ03DvYAGbUY9qBfNj7Y9YL3iu/Q+Oy4bHtI6d/a7B
wepol2xe1sYEtQ+R3yMPXFte0483n8XIdXxa412ZSIEBfLxHzV6M7JTbPtgWwE+9
7xvyYbO7xQL9N/P8ZGg75eEqXtUrepGmJG0Y30qF5sNdMG0pWoz1bzDjSLNCnylq
JwsO8p1EHNPnPRqotwZZSfLUW16eREqLaOrSC84gIw5Q6zAMZe/k2ckzzHKPGB1c
sckaQROcgK4Lu9ywjRjBjNqclOMasf0MCrsDVMQE/wnh4GoDL/PAyEOqnlpvJ+cx
k4kmOrEz5GRZQehHUI7CdejFwZ32sAB+nV2r8zDW9FSxgoRoFvtE2hooJ9orv0IU
1F8TeBdifVP/Ef/lHAHs6IqEH45y72WqrWFZsIdU1PDe0MyfgMaOBwdwXNeZqky/
IF5SMKbl9yA=
=F9Oq
-----END PGP SIGNATURE-----

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Erik Jones 2007-08-25 19:58:06 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Previous Message Joshua D. Drake 2007-08-25 18:18:33 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Browse pgsql-performance by date

  From Date Subject
Next Message Erik Jones 2007-08-25 19:58:06 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Previous Message Joshua D. Drake 2007-08-25 18:18:33 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views