Skip site navigation (1) Skip section navigation (2)

Re: Questions on Tags table schema

From: Richard Huxton <dev(at)archonet(dot)com>
To: Jay Kang <arrival123(at)gmail(dot)com>,Pgsql performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Questions on Tags table schema
Date: 2007-07-30 14:58:31
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Jay Kang wrote:
> Hey Richard,
> Sorry for the late reply, I was just making my first test version of the DB
> closely resembling you suggested design. Just wanted to write you back
> answering your questions. So here we go:

No problem - it's email and what with different timezones it's common to 
have gaps.

>> What is AddedBy - a name, a user-id?
>> If it's an ID, then it seems very long.
>> If it's a name, then 256 characters sounds a bit arbitrary as a length.
>> Why choose 256?
> No, AddedBy is the username of the individual. Why is 256 characters
> arbitrary as a length? Would 255 be better or 32? I guess, your saying its
> too long, shorten it, I'm just going with what the books says, but I really
> welcome any comments you have^^

The book doesn't know what you're trying to do. You do. The important 
thing is not whether you choose 256 or 32 or 100, it's that you've 
thought about it first.

Obvious thoughts:
1. Is this going to be a real name "Richard Huxton" or an identifier 
"rhuxton123"? You'll want more characters for the real name than an 
2. Where will this be displayed and will it take up too much space? If I 
pick a username of WWW...250 repeats...WWW does that mess up any formatting?
3. Do we allow HTML-unsafe characters ('<', '&') and escape them when 
used, or just not allow them?

No wrong or right, the process of thinking about it is important.

> The advantage of *not* having AddedBy as a foreign-key is that you can
>> delete users and not have to update tags with their user-id. The
>> disadvantage is the same thing. You can end up with tags added by
>> non-existent users.
> Thanks, I would like anonymous users to be able to add tags, so I guess I'll
> leave it the way it is^^

You would normally use NULL to indicate "unknown", which in the case of 
an anonymous user would be true. A NULL foreign-key is allowed (unless 
you define the column not-null of course).

>> Well, it all depends on your use analysis. You could make a good
>> argument that there are two sets of fact data:
>> 1. "Identity" data - id, name, added-by, added-ts, status
>> 2. "Activity" data - number of views, last time clicked etc
> If I were to create Identity and Activity for the Tags table, would I be
> creating two separate tables called TagActivities and TagIdentities?

That's what I'm talking about, and you'd have a foreign-key constraint 
to make sure activity refers to a real tag identity. Again, I'm not 
saying you *do* want to do this, just that you'll need to think about it.

> Currently, I'm not sure how I'll analysis the data for Tags. I know that I
> want to do the bigger font if it is popular and smaller font if its not.
> Hmm, would like to see examples of other websites that utilized Tags tables
> to see how they implemented this function. I was thinking of adding tagcount
> (popularity) for each user within the user definition table.

For this particular case, you'll almost certainly want to cache the 
results anyway. The popularity isn't going to change that fast, and 
presumably you'll only want to categorise them as VERY BIG, Big, normal 
etc. I assume allows you to cache this sort of information somehow.

>>>> You have cars which have tags and planes which have tags. Tagging a
>>>> plane is not the same as tagging a car. Either you confuse that issue,
>>>> or you want separate tables to track each relationship.
>>> Hmm, so if I have a tag called "Saab" and a user clicks on Saab, then
>>> information from both Cars and Planes table would appear.
>> Well, if you UNION them, yes. Of course you'll need to find columns that
>> make sense across all types. If planes have e.g. "wingspan" then you'll
>> need to add 'not applicable'::text in the car-related subquery.
> Hmm, currently I can't visualize the query, again, it would help if I can
> see the data to see what you mean. If planes table had a tag called
> wingspan, wouldn't the query just not show any value for the field so it
> wouldn't need 'not applicable' in the car-related subquery? Not sure really.

Sorry - I'm trying to say that if you UNION together several queries 
they all need to have the same columns. So - if one subquery doesn't 
have that column you'll need to provide a "not applicable" value instead.

Best of luck with the application, and don't forget to cache query 
results when they don't change often. It'll boost performance quite a bit.

P.S. - try the "general" mailing list if you want to discuss this sort 
of thing some more. This one is really supposed to be 
performance-related questions only.

   Richard Huxton
   Archonet Ltd

In response to

pgsql-performance by date

Next:From: Steven FlattDate: 2007-07-30 16:04:08
Subject: Re: Vacuum looping?
Previous:From: Richard HuxtonDate: 2007-07-30 11:57:06
Subject: Re: Questions on Tags table schema

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group