Re: Partitioning into thousands of tables?

From: "Davor J(dot)" <DavorJ(at)live(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Partitioning into thousands of tables?
Date: 2010-08-10 12:37:17
Message-ID: i3rh7t$d9b$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Data Growth Pty Ltd" <datagrowth(at)gmail(dot)com> wrote in message news:AANLkTimZaiO+7mTuR=sX0jCQwU7Uk+_XZnudL4qRMUsh(at)mail(dot)gmail(dot)com(dot)(dot)(dot)
I have a table of around 200 million rows, occupying around 50G of disk. It is slow to write, so I would like to partition it better.

Have you actually tested this? Why do you expect an improvement? I am quite interested.

Manual states:

"The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of
thumb is that the size of the table should exceed the physical memory of the database server."

Unfortunately, this last sentence is not explained. What operations would benefit from partitioning and what operations would not?

Another problem is that there is no time complexity information in the manual, and it can hardly be found on the net either.

But here is a try based on my limited understanding from the docs (and on which I would appreciate some feedback):

- INSERT on an unconstrained, unindex, etc (i.e. plain table): O(1). So the table size "in itself" doesn't play a role. But you probably have indexes. If they are B-trees you probably would be in the range of O(log(n)). (See http://en.wikipedia.org/wiki/B-tree. Unfortunately it doesn't show complexity, but it does say in the heading that "B-tree is optimized for systems that read and write large blocks of data". Also check http://en.wikipedia.org/wiki/B-tree#Insertion) Now 200M or 2M records... I wouldn't expect much improvement.

-UPDATES: I read somewhere that indexes use pointers to the data. I suppose your UPDATE-constraints are indexed, so there is no need for sequential scans on the implicit SELECT. So partitioning will not give you better performance. System cache will do it's job here.

A possible problem would be if your indexes are larger than your available memory. What impact that would have I completely do not know and I think it would be nice if someone could clear that up a bit. What impact would it have on SELECT? But in your case with 200M records, the indexes probably fit well into memory?

Regards,
Davor Josipovic

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Atul.Goel 2010-08-10 12:54:43 Re: pl/pgsql editor and postgres developemnt tool
Previous Message Torsten Zühlsdorff 2010-08-10 12:30:41 Re: InitDB: Bad system call