Re: [SQL] (Ab)Using schemas and inheritance

From: Jorge Godoy <jgodoy(at)gmail(dot)com>
To: Alban Hertroys <alban(at)magproductions(dot)nl>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: [SQL] (Ab)Using schemas and inheritance
Date: 2006-05-24 15:21:52
Message-ID: 200605241221.55060.jgodoy@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

Em Quarta 24 Maio 2006 11:48, Alban Hertroys escreveu:
> Jorge Godoy wrote:
> > Some things are really important here:
> >
> > - performance for operations on an individual company --- it has to be
> > as fast as possible because this might be used by my client's clients.
>
> I suppose inherited tables would be faster. Both at read and write
> operations.
>
> However, I think that partitioning/clustering the table over an index on
> company_id would help a lot. Also, considering you need to keep a lot of
> old data around, you may be able to split your data in archived and
> recent data, keeping your indices small (partial indices maybe).

But this could be easily done with two commands (besides creating indices
again):

ALTER TABLE x RENAME TO x_year;
CREATE TABLE x () INHERITS FROM base.x;

No need to change the rest...

> > - safety: if it becomes available externally, then one client could
> > never see other client's data. This is a main concern. If used only
> > internally it is desirable that only authorized employees view each
> > client since all of them doesn't need to access all clients even if
> > working directly with accounting.
>
> Safety would be about equal to your inheritence solution. You can GRANT
> rights on the views, that can be different from the access rights on the
> base tables. The views must evidently have access to your base tables.

Of course. This way, to keep the view working, I'd have to use SELECT INTO
when archiving data, right? IIRC, views would still refer to the old table
if I rename them (this is a problem on my solution as well if I can't use a
common view and make it respect the search_path, as I described on the other
message).

> > - easy to code on application side: other systems will be plugged to
> > this database. The more we can do to avoid other people mistakes, the
> > better.
>
> Views might be a bit easier there, as you can add derived data to your
> results (like first name, infix and surname concatenated, dates in a
> format specific for a companys' locale, etc.).

I'll have the full data on each schema, so this is also possible.

> > - easy to maintain database: if it is too painful, hacks will come and
> > this is not a good plan before starting the project...
>
> This was my main reason to suggest views. For select you could probably
> get away with SELECT * FROM table WHERE company_id = x; (Though SELECT *
> is being frownde upon).

Instead of doing SELECT * FROM schema.table; (no filter here)...

> The update/insert/delete parts are also all rather similar. You may be
> able to generate REPLACE VIEW statements if the base table layout
> changes. You'd probably need to DROP and re-CREATE them, but being views
> your data is not at risc. Nothing physically changes in your tables.
> Hmm... Almost forgot about the RULEs that make them updatable... Those'd
> need updating too, of course...

Is it at risk with inherited tables?

> With inherited tables you would need a way to dump and restore their
> contents, or do a whole lot of ALTER TABLE statements. You'd be
> physically altering the tables that contain your data; there's more than
> zero risc involved. You can't rollback DDL statements...

Why? If I add a column to the base table then this new column appears on all
inherited tables. And as I can't rollback DDL statements, I won't be risking
breaking things when I change the view ;-)

> OTOH, with inherited tables you can have company-tables that differ from
> the base tables for a specific company; with views that'd be... difficult.

I'd have to have an exclusive view and use auxiliar tables if I wanted that
with views... For now, from what we've discussed with the client, this won't
be the case. But it is something that we should think about for future
expansions... If there aren't too many exceptions, then we can deal with
that somewhat cleanly on both sides with the approach of the auxiliar table
and exclusive view, but if there are too many exceptions adapting the table
might be better (both are "hard" if there are too many exceptions...). I
haven't gotten into this planning level yet.

> > We haven't benchmarked anything yet, but from what is in the docs, this
> > looked like a good approach.
>
> Neither did I benchmark updatable views; haven't had much chance to use
> them yet. No, I don't have a lot of experience with them, but I know
> some people on this list must have.

I hope they give us some advice :-)

> > Updateable views give me the part of second and also the third item but
> > it seems to be missing on the first and last items... On the other hand,
> > if it solves problems with views and functions that I said I was having
> > on the other thread then might become interesting... But performance
> > would still be a problem with millions of records (by law we're required
> > to keep at least 5 years of docs online for some docs, for other it is
> > required to have the full company history... so getting to dozens of
> > millions in 5 years is not all that hard...).
>
> In my experience, having the right indexes on your tables/views helps a
> LOT. Millions of records need not be a problem if you know what you will
> be querying for.

You're right about indexes.

> > Am I right or completely wrong? :-)
>
> Have you considered downtime if a schema change is necessary? What would
> cause the most; tables or views?

What do you mean by a schema change? If we go this route, then this will have
to be a core part of the database and application design. Changes here will
be critical, so there shouldn't be changes or they should be done
incrementally. Bigger changes will require a maintenance routine that will
probably take the system offline for some time (from minutes to hours).

We're studying what to do to be able to plan things like that.

> I'm pretty sure VIEWs will be faster if you need to DROP/CREATE the
> tables, as you'll have to move your data around not to lose it... Of
> course, you'd need to do the same for your base tables, that contain
> _all_ your data in the updatable view case...

As I said, I was thinking along the lines of renaming tables to archive them
when needed (weekly, monthly, quarterly, yearly) and inheriting from "base"
schema again. This looks like a very fast operation...

Do you believe it will be problematic?

> Well, only more things to consider choosing between, I'm afraid.

Don't be. Those are very helpful as well. One can't remember everything, so
it is nice to have someone criticizing your project and design. I'm very
thankful.

I just few that we're starting to walk on not-stable grounds here... This is
my main concern. Maybe a more traditional approach as you suggest should be
the best approach...

Then, we'd have a "company_id" to separate data per company, create updateable
views for each company and see how we can write our functions to work with
either the views or the tables (what will be better and easier to expand as
new companies are added)...

--
Jorge Godoy <jgodoy(at)gmail(dot)com>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nik 2006-05-24 15:24:18 Re: Insert into partition table hangs
Previous Message Sim Zacks 2006-05-24 15:17:03 Re: background triggers?

Browse pgsql-sql by date

  From Date Subject
Next Message Alban Hertroys 2006-05-24 16:06:43 Re: [SQL] (Ab)Using schemas and inheritance
Previous Message Richard Huxton 2006-05-24 15:03:26 Re: PLEASE help ME , HOW TO GENERATE PRIMARY Keys on the fly