Re: CREATE TABLE with REFERENCE

From: Jonathan Bartlett <johnnyb(at)eskimo(dot)com>
To: Dmitry Tkach <dmitry(at)openratings(dot)com>
Cc: "kay-uwe(dot)genz" <kug1977(at)web(dot)de>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: CREATE TABLE with REFERENCE
Date: 2003-07-29 17:13:06
Message-ID: Pine.GSU.4.44.0307290938370.29931-100000@eskimo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

NOTE - after writing all this, I did think of a possible solution, but I'm
not sure if PG can handle it. If I made a table called "object" with one
column, the object_id, and then had EVERY table inherit from this table.
Then, I could have my constraints set up against this master table. (I'm
not against constraints - actually for them - but when the database
doesn't support everything I want I'm not going to go heads-over-heals to
do database-based consistency that I can achieve another way).

Anyway, more discussion follows...

> If you can combine your queries with a union, your table layouts must be
> very similar if not identical.
> Why not put everything into the same table then, and just have an FK
> between that table and the notes?

No, there are _parts_ that are very similar. I don't know where this
whole "table-combining" kick came from, but I've usually found that it
ends in a disaster.

> What do you mean by soft-delete?
> Leaving orphaned notes behind? Bad idea...

I have a boolean flag that says "active". I don't ever actually purge
data. There are times when it is useful to come back in and look at
what's been "deleted". From the user-interrface standpoint it has been
deleted, but we can still go back in and retrieve records after they are
gone.

> How can you be sure? What if you get hit by a bus, and another admin,
> who is not as knowledgeable as you are goes to the db, and runs an
> insert with plain sql?

They should read the documentation. If they want to run an insert with
SQL, they should at least be smart about it :) If someone's too stupid to
read documentation, they are going to screw up the database no matter
what.

> What if, while one connection runs your 'note_html' code, another one
> deletes an object you are annotating?

As I said, I only do soft deletes. So this wouldn't affect anything.

> > Let's say, theoretically, somehow some data got in there
> >which wasn't attached to anything. First of all, the only way that could
> >happen is if there was some really broken code,
> >
> Not necessarily. This can easily happen with concurrent transactions
> (see above).
> Also, even if it was indeed only possible because of a broken code, you
> are not saying that your code is bug-free, are you?

If it's not bug-free, having a good database schema isn't going to save
me.

> Even, if it was, postgres is not, and your filesystem and OS are not
> either. If the database crashes in the middle of your insert, you'll end
> up having inconsistent data.

Transactions will handle that one.

> If I understand your statement ('what would the harm be') correctly, and
> you just don't care about your data consistency, then, I guess, you are
> rigfht - you don't need any constraints... but, in that case, I don't

You are missing the point. There are many things that must be balanced:

* Ease of programming / speed of development

* Correctness of code

* Consistency of data

I can get much further with my system on all three points than I can with
yours. With mine, I have a single, tested function that I can use
anywhere. This hits off the first two. While this prevents me from
having a database-checked #3, it still gives me consistent data because we
don't do hard-deletes and we have tested and verified #2. If you can show
me how to get te ease-of-programming and correctness with your approach,
I'd be happy to use it.

> think you need a transactional database to begin with. If all you want
> from the database is being able to run queries, you are better off
> running grep on a bunch of text files, or with some light-weight sql
> tool, like mysql or sqllight - either of those will perfrom a lot
> better, because they do not bother with the overhead of having to care
> about your data consistency, and concurrent access.

I do care about data consistency and concurrent access. And my data is
consistent. I use many of the RDBMS features of postgres such as views /
subselects, transactions, triggers, etc. The fact that I have instances
where they fall short and I have to do it in the application just shows
that our tools are incomplete, not that I don't care about data
consistency. In fact, I have trouble thinking how anyone could have
sensical data with everything joined together into one uber-table, which
you seem to be advocating.

> >We all use the same sequence.
> >
> Right. What if somebody forgets to use that sequence?

We ahve a standard creation process. If someone is an idiot, that can't
be helped. What if someone pulls out several drives from a running RAID?
What if someone inserts bad records (even if they are inconsistent)? What
if someone steals the server?

If you wind up with problems, you have to fix them. The fact that
problems are possible does not make them likely. There is much more
likelihood of a programmer screwing up writing their 14th copy of
note_html and the associated table structure than there is that someone
pulls a number out of their butt. Where would they get the number if not
from the sequence? "Oooh, I know, I'll insert 10 as the primary key, jsut
for kicks! I'll even hardcode that into the app!" Not a very likely
scenario.

> What if you load your database from a backup and forget to reinit the
> sequence?

If it's being loaded from backup, the schema reinits the sequence on
schema-load.

> If that was the case, you would not be able to combine them with a
> union, as you said you do...

No, I said that there were certain instances that this happens. Not that
it is the norm. In the cases where I do combine them with a union, I am
only combining the similar parts, not the whole shebang.

> No, it would not. Application logic has nothing to do with your database
> schema.
> You need to design the schema to ensure effectiveness and reliability.
>
> Then, you design your application on top of it, that handles the
> business logic.
> From the database perspective, there is no difference between payments
> and sponsors, as long as both have the same (or similar) sets of attributes.
> Iterpreting those attributes is not database's job.

Right, they don't. The only thing they have in common, is that notations
can be added to them. Which is why I have a common notes table, used by
both of them (and several others as well).

> >Splitting the notes table would be pointless. Why do it?
> >
> Because that would make it possible to use the constraints.

Maybe the constraint system should be extended to allow it to check
across multiple tables. The OID type/column is ideal for this. Sadly, it
is only 32 bits, and they are not storing an OID/table lookup like I think
they should. That would enable really powerful database applications that
are way too much work to make today. With that kind of thing, you could
even do record merges with automatic database support.

> Also, if one adopts your earlier point, it can also be argued, that it
> is equally 'nuts' to have notes about Payments stored together with
> notes about Sponsors.
> Those notes have just as much to do with each other as the objects they
> annotate. :-)

But they are all "notes" on "objects".

> If you insist that Payments must be separate from Sponsors, the same
> exact argument should be applied to their respective notes

Why? All of the note's attributes function exactly the same, while none
of Payments and Sponsors function the same. Again, you have yet to show
how you can make a mechanism that is as easily extensible as mine is. The
fact that database constraints aren't available to support it is a fault
of the database, not mine.

> >The way I have
> >it set up now, it takes _1 line of code_ to add note-taking capabilities
> >to my forms.
> >
> It would *still* be one line of code with either of the approaches I
> suggested. Your code doesn't really have to be affected at all
> (although, I think, it would really benefit from adding the object_type
> argument to your note_html() function, but even that is not necessary)

Hmmm... changing table names, having to go in and recreate an exact copy
of the schema every time that is added.

> You want that same one line of code, working against the properly
> designed and normalized sql schema, that lets you rely on the database
> top ensure your data consistency and access efficiency.

Yes, but to set it up requires a full database change.

> >Then, if I want to enhance the note_html interface, I have to
> >modify the schema in 5 places (that's 5 places so far - as time goes on
> >this will likely increase to 10 or 15), and possibly have separate copies
> >of the note_html code. That's craziness. I can't think of one good
> >reason to do that.
> >
> Not at all. If you give up your idea about splitting your payments from
> your sponsors for example, you won't need to modify your schema *at all*
> if you need to add another object type, or another kind of note, or
> whatever - all you'd need to do would be to implement the new
> application logic in your application, where it belongs, and be done
> with it. No need to even touch your schema at all.

I don't see why you want to make my entire database into one giant table.
Why bother with tables at all in that case?

> That's exactly what inheritance does (yes, with a single line).
> The particular implementations of inheritance in sql have their problems
> (as I mentioned earlier), that make me really reluctant from using it,
> but being able to add features to your objects, with a single line of
> code isn't one of them - to the contrary, it's a *huge* benefit.

First of all, inheritance wouldn't work in case of the notes, as it is a
one-to-many relationship.

No, it's not. At least with PostgreSQL, you can only have ONE line of
inheritance (at least as far as I am aware - please correct me if I'm
wrong). In that case, all of these would have to have a base class of
"notes". Of course, if I wanted to add other similar features, I would be
unable to. For example, if I wanted to add a "changelog" table, and be
able to attach last_modified, creator, etc to certain tables, I could not
do so without adding them to all of them.

The approach I have let's me pick and choose any additional feature I want
to add to any record.

> >Again, this would require modifying and testing that trigger every time I
> >want to add a new thing to take notes on.
> >
> Sure, if you implement in such way. But not if you give it some thought
> in advance, and come up with an implementation that would be generic
> enough not to care about your application-specific differences between
> sponsors and payments :-)

Actually, they are database-specific.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul Thomas 2003-07-29 18:24:56 Re: Basic questions before start
Previous Message gogulus 2003-07-29 17:04:04 Basic questions before start