Re: Large number of tables slow insert

From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: "Loic Petit" <tls(dot)wydd(at)free(dot)fr>
Cc: peter(dot)schuller(at)infidyne(dot)com, pgsql-performance(at)postgresql(dot)org, hhall1001(at)reedyriver(dot)com, dforums(at)vieonet(dot)com, max(at)nts(dot)biz(dot)ua
Subject: Re: Large number of tables slow insert
Date: 2008-08-25 01:08:08
Message-ID: a1ec7d000808241808r15beea9jeac0b7bbd9e767f7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I don't know if the max_fsm_relations issue will solve your problem or not.
I do know that you definitely want to increase it to a number larger than
the sum of all your tables and indexes -- preferably with room to grow.
Additionally the max_fsm_pages value will likely need to be increased as
your data size grows.

I work with about 9000 tables at the moment (growing each day) and do not
see your issue. I do not have indexes on most of my tables, and
max_fsm_relations is set to 30000.

Although this will increase the number of tables even more-- you may want to
consider partitioning your tables by time: day or week or month.
This way, you may not even need an index on the date, as it will only scan
tables over the date range specified ( NOTE -- this is not true if you use
prepared statements -- prepared statements + partitioned tables =
performance disaster).
In addition, this may allow you to add the indexes on the partitioned table
at a later date. For example:

Partitions by week -- the current week's table has no indexes and is thus
fast to insert. But once it becomes last week's table and you are only
inserting into a new table, the old one can have indexes added to it -- it
is now mostly a read-only table. In this way, full scans will only be
needed for the current week's table, which will most of the time be smaller
than the others and more likely be cached in memory as well. You may want
to partition by day or month instead.
You may want to combine several sensors into one table, so that you can
partition by day or even hour. It all depends on how you expect to access
the data later and how much you can afford to deal with managing all those
tables -- postgres only does some of the partitioning work for you and you
have to be very careful with your queries. There are some query optimizer
oddities with partitioned tables one has to be aware of.

On Sun, Aug 24, 2008 at 3:30 PM, Loic Petit <tls(dot)wydd(at)free(dot)fr> wrote:

> Quite a lot of answers !
>
> > Does all INSERTs are made by one application? Maybe PREPARED STATEMENTS
> will help you? What about optimization on application level?
> Yes it's on only one application (through JDBC), optimizations (grouped
> transaction, prepared statement) will be done but that our very first test
> in hard condition which scared us all :p.
>
> > Can you tell us what kind of application this is? It sounds like a
> control systems application where you will write the current values of the
> sensors with each scan of a PLC. If so, is that a good idea? Also is 3,000
> sensors realistic? That would be a lot of sensors for one control system.
> Our research project is trying to manage large scale sensor network
> deployments. 3.000 is quite a huge deployment but it can be realistic for
> huge aggricultural deployment for example.
>
> > That would be an insert plus updates to each of your 6 indexes every 0.33
> ms. Is that a good idea? Is there a better strategy? What are you measuring
> with the instruments e.g. is this a process plant or manufacturing facility?
> What will people do with this data?
> I try to suppress the indexes the more I can. Actually I only really need
> the index on timestamp to see for example the last readings, and to search
> for a historical data by period, the others (values) are more for "when this
> sensor was over 45ºC" for instance but it can be without indexes (will be
> slow but less heavy at insert time). I get the data from differents telosb
> motes that gathers temperature / humidity and light.
>
> > Have you checked what you are bottlenecking on - CPU or disk? Try
> iostat/top/etc during the inserts. Also check actual disk utilizatio (iostat
> -x on linux/freebsd; varies on others) to see what percentage of time the
> disk/storage device is busy.
> I saw the results of iostat and top, the postgres process was at 70% cpu .
> Yes I know that my test machine is not brand new but I have to find a good
> solution with this.
>
> Ok I just ran some tests. It seems that I spammed too much right after the
> creation of the tables, thus the vacuum analyse could not be ran. I have
> better results now :
>
> Average of writing 10 rows in each table
> ON 1000 TABLES
> Without indexes at all : ~1.5s
> With only the index on timestamp : ~2.5s
> With all indexes : ~30s
>
> ON 3000 TABLES
> Without indexes at all : ~8s
> With only the index on timestamp : ~45s
> With all indexes : ~3min
>
> I don't know why but the difference is quite huge with indexes ! When I
> did my vacuum the system told me about the "max_fsm_relations" (1000). Do
> you think it will change something (as Scott said). I didn't have time to
> run tests with vacuum analyze on system table see you tomorow for other
> news...
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2008-08-25 01:34:18 Re: Identifying the nature of blocking I/O
Previous Message Craig Ringer 2008-08-25 00:30:18 Re: Identifying the nature of blocking I/O