Re: One database vs. hundreds?

From: "Kynn Jones" <kynnjo(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: One database vs. hundreds?
Date: 2007-08-28 14:36:22
Message-ID: c2350ba40708280736tde37dd2qb40c412c3a216a94@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Thank you very much for your replies. Given the differences in the
opinions expressed, I thought I would describe the database briefly.

The purpose of the database is basically translation of terms.
Imagine a collection of disjoint sets A, B, C, ...

Now imagine that for each element of a set multiple *names* exist,
although even within each set, the number of names varies from one
element to the next. To complicate matters further, imagine that
there are several *authorities* X, Y, Z... on the subject of naming
these elements, and they don't always agree.

So currently, the relevant portion of schema is something like

SET ... one record per available set
AUTHORITY ... one record per naming authority
ELEMENT ... which refers to SET and has one record per element in any
of the sets
NAME ... which refers to ELEMENT and to AUTHORITY and whose records
represent the various names assigned to each element by the various
authorities.

The largest tables, by far, are ELEMENT and NAME. It is clear from
this description that ELEMENT can be partitioned according to the set
each element belongs to, since these sets are disjoint (for the moment
I'm using the word "partitioned" in its general sense, not in the
sense of Pg's table partitioning that Kamil proposed). Similarly NAME
can be partitioned according to the authorities associated with each
naming (even when two authorities agree on a name for a given element,
NAME includes separate records to represent this fact). Furthermore,
each one of these authority-based partitions can be in turn
partitioned according to the set that the referred-to element belongs
to.

Therefore instead of having a single database we could have many
databases, one for each combination of set and authority, with the
simplified schema

ELEMENT
NAME ... refers to ELEMENT

From a semantic point of view, all the kinds of queries that the
system is designed to support would be satisfied by this design.

Given this it seems to me that Pg's table partitioning does indeed fit
the bill after all. I like the idea of being able to bulk-update
entire chunks of the database by creating temporary a partition,
testing it, deleting the old one, and renaming the temporary partition
to replace it.

Furthermore, it seems like this arrangement, contrary to my original
multiple-DB idea, does not preclude the occasional global query
spanning multiple authorities or sets (though these queries would be
useful mostly for administrative purposes).

I join btober in my amazement and admiration at the overall coolness
of PostgreSQL. I'm sending massive props to the developers right
now...

Many thanks!!!

kj

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kamil Srot 2007-08-28 14:41:14 Re: Tables dissapearing
Previous Message Lincoln Yeoh 2007-08-28 14:33:27 Re: [HACKERS] Undetected corruption of table files

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-08-28 15:00:31 Re: Testing the other tsearch dictionaries
Previous Message Lincoln Yeoh 2007-08-28 14:33:27 Re: [HACKERS] Undetected corruption of table files