Re: how to handle a big table for data log

From: kuopo <spkuo(at)cs(dot)nctu(dot)edu(dot)tw>
To: Jorge Montero <jorge_montero(at)homedecorators(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: how to handle a big table for data log
Date: 2010-07-21 03:51:44
Message-ID: AANLkTilcCUP2ohhotmvoOoDlp2j1OiqwcZusEvYjjfzS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Let me make my problem clearer. Here is a requirement to log data from a set
of objects consistently. For example, the object maybe a mobile phone and it
will report its location every 30s. To record its historical trace, I create
a table like
*CREATE TABLE log_table
(
id integer NOT NULL,
data_type integer NOT NULL,
data_value double precision,
ts timestamp with time zone NOT NULL,
CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
)*;
In my location log example, the field data_type could be longitude or
latitude.

I create a primary key (id, data_type, ts) to make my queries more
efficient. The major type of queries would ask the latest data_value of a
data_type by given id and timestamp. For this kind of query, I make the
following SQL statement
*SELECT * FROM log_table WHERE id=[given id] and data_type='longitude' and
(ts = (SELECT max(ts) FROM log_table WHERE id=[given id]and
data_type='longitude' and ts<=[given timestamp]));*
According to my evaluation, its performance is acceptable.

However, I concern more about the performance of insert operation. As I have
mentioned, the log_table is growing so I decide to partition it. Currently,
I partition it by date and only keep it 60 days. This partition is helpful.
But when I partition it by data_type (in my case, the number of data_type is
limited, say 10), the performance of insert operation will be degraded. I
guess this is caused by multiple vacuum/analyze on these partitioned
data_type log tables. However, if I put all data_type logs together, I can
expect that the performance of insert operation will also have degradation
if I want to expand the system to support more mobile phones or more
data_type.

This is my current situation. Please give me some hints to improve the
performance (especially for the insert part).

kuopo.

On Mon, Jul 19, 2010 at 11:37 PM, Jorge Montero <
jorge_montero(at)homedecorators(dot)com> wrote:
> Large tables, by themselves, are not necessarily a problem. The problem is
> what you might be trying to do with them. Depending on the operations you
> are trying to do, partitioning the table might help performance or make it
> worse.
>
> What kind of queries are you running? How many days of history are you
> keeping? Could you post an explain analyze output of a query that is being
> problematic?
> Given the amount of data you hint about, your server configuration, and
> custom statistic targets for the big tables in question would be useful.
>
>>>> kuopo <spkuo(at)cs(dot)nctu(dot)edu(dot)tw> 7/19/2010 1:27 AM >>>
> Hi,
>
> I have a situation to handle a log table which would accumulate a
> large amount of logs. This table only involves insert and query
> operations. To limit the table size, I tried to split this table by
> date. However, the number of the logs is still large (46 million
> records per day). To further limit its size, I tried to split this log
> table by log type. However, this action does not improve the
> performance. It is much slower than the big table solution. I guess
> this is because I need to pay more cost on the auto-vacuum/analyze for
> all split tables.
>
> Can anyone comment on this situation? Thanks in advance.
>
>
> kuopo.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Elias Ghanem 2010-07-21 07:31:07 Using more tha one index per table
Previous Message Scott Carey 2010-07-20 22:58:07 Re: IDE x SAS RAID 0 on HP DL 380 G5 P400i controller performance problem