Skip site navigation (1) Skip section navigation (2)

Re: Questions on Tags table schema

From: Richard Huxton <dev(at)archonet(dot)com>
To: Jay Kang <arrival123(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Questions on Tags table schema
Date: 2007-07-30 11:57:06
Message-ID: 46ADD212.3090406@archonet.com (view raw or flat)
Thread:
Lists: pgsql-performance
Jay Kang wrote:
>> One quick point. SQL is case-insensitive unless you double-quote
>> identifiers. This means CamelCase tend not to be used. So instead of
>> AddedBy you'd more commonly see added_by.
> 
> Yes, I am aware that postgre is case-insensitive, but I write all query with
> case so its easier for me to read later on.

It's SQL that's case insensitive. Pretty much any SQL-based database 
system you use will do case-folding in some way.

>> CREATE TABLE Tags
>>> (
>>>    TagID serial NOT NULL,
>>>    TagName varchar(64) NOT NULL,
>>>    AddedBy varchar(256) NOT NULL,
>> This is supposed to be a user? But it's not a foreign-key, and you've
>> decided that 255 characters will be a good length, but 257 is impossible.
> 
> 
> I'm developing in c# with asp.net 2.0 which as a membership provider. I'm
> using ASP.NET 2.0 Website Programming / Problem - Design - Solution" (Wrox
> Press) <http://www.amazon.com/gp/product/0764584642> as a reference, so not
> having AddedBy as a foreign key within each of the tables was taken directly
> from the text. I do not understand your comment about 255 character with 257
> being impossible? Could you elaborate, if you feel it warrants further
> elaboration.

What is AddedBy - a name, a user-id?
If it's an ID, then it seems very long.
If it's a name, then 256 characters sounds a bit arbitrary as a length. 
Why choose 256?

The advantage of *not* having AddedBy as a foreign-key is that you can 
delete users and not have to update tags with their user-id. The 
disadvantage is the same thing. You can end up with tags added by 
non-existent users.

>>    AddedDate timestamp NOT NULL,
>>
>> You probably want "timestamp with time zone" (which represents an
>> absolute time) rather than without time-zone (which means 1pm in London
>> is different from 1pm in New York).
> 
> OK, timestamp with time zone it is. To be honest, I've been using postgresql
> for a while now, but never tried using timestamp with time zone.

You can get away with it as long as the time-zone setting on your client 
  stays the same. Then it changes, and you're left wondering why all 
your comparisons are hours out.

> Also, if it's "AddedDate" why isn't it a date?
> 
> I had this first as a date, but asp.net 2.0 didn't like it, and changing it
> to a timestamp fixed the problem.

Surely asp.net has a date type? If not, I'd suggest AddedTimestamp as a 
name (or AddedTS if you don't enjoy lots of typing :-). It won't matter 
to you now, but 12 months from now it'll save you looking up data types.

>>    Status int NOT NULL,
>>>    ViewCount int NOT NULL CONSTRAINT DF_tm_Tags_ViewCount  DEFAULT
>> (('0'))
>>> );
>> You might not want to mix in details about number of views with details
>> of the tag. Particularly if you might record more details later (when
>> viewed, by whom etc).
> 
> Are you suggesting to separate the Tags table into Tags and TagDetails?
> Because ViewCount within Tags table would represent how many times that tag
> was clicked, I think others would call this field Popularity. I've been
> reading alot about tags and I am fascinated at all the information about
> user tags can provide. Where would I put information such as ViewCount,
> AddedBy, Status, etc if not within the Tags table? Sorry, if I'm totally
> missing your point.

Well, it all depends on your use analysis. You could make a good 
argument that there are two sets of fact data:
1. "Identity" data - id, name, added-by, added-ts, status
2. "Activity" data - number of views, last time clicked etc

Depending on how you intend to use the tags, it might make sense to 
separate these. Particularly if you find yourself with no sensible 
values in activity data until a tag is used.

 From a separate performance-related point of view, you would expect 
activity data to be updated much more often than identity data.

>> You have cars which have tags and planes which have tags. Tagging a
>> plane is not the same as tagging a car. Either you confuse that issue,
>> or you want separate tables to track each relationship.
> 
> Hmm, so if I have a tag called "Saab" and a user clicks on Saab, then
> information from both Cars and Planes table would appear.

Well, if you UNION them, yes. Of course you'll need to find columns that 
make sense across all types. If planes have e.g. "wingspan" then you'll 
need to add 'not applicable'::text in the car-related subquery.

 > If I'm inserting a
> new row for a tag, wouldn't I need to check if that tagname already appears
> within the Tags table or would I just create a new row with that tag name.
> Sorry, I'm not sure what " 'car'::text " this is doing, but I'm guessing its
> used to group the cars, planes, etc. so it knows which item_type it is.
> Brilliant!

Yes, if you're giving a list of all "Saab"s, I'm assuming your users 
will want to know if it's a plane or car. The ::text is just PostgreSQL 
shorthand for a cast - it's good practice to specify precise types for 
literal values.

-- 
   Richard Huxton
   Archonet Ltd

In response to

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2007-07-30 14:58:31
Subject: Re: Questions on Tags table schema
Previous:From: Jay KangDate: 2007-07-30 11:26:10
Subject: Re: Questions on Tags table schema

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group