Re: Optmal tags design?

From: "Bryan Murphy" <bryan(dot)murphy(at)gmail(dot)com>
To: "lists(at)on-track(dot)ca" <lists(at)on-track(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optmal tags design?
Date: 2007-07-18 21:54:07
Message-ID: bd8531800707181454xb968a48ya3c0efc160d104a4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We store tags on our items like this like this:

Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE

Item.ID INT NOT NULL PRIMARY KEY

ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE

with appropriate indexes on the columns we need to frequently query.

We have about 3 million tag bindings right now, and have not run into any
performance issues related to tagging other than generating tag clouds
(which we pre-calculate anyway).

I'll have to get back to you when we get up to 10's, or even 100's of
millions and let you know how it scaled.

Bryan

On 7/18/07, lists(at)on-track(dot)ca <lists(at)on-track(dot)ca> wrote:
>
> I am planning to add a tags (as in the "web 2.0" thing) feature to my web
> based application. I would like some feedback from the experts here on
> what the best database design for that would be.
>
> The possibilities I have come up with are:
> * A tags table containing the tag and id number of what it links to.
> select pid from tags where tag='bla'
> select tag from tags where pid=xxx.
>
> * a tags table where each tag exists only once, and a table with the tag
> ID and picture ID to link them together.
>
> select pid from tags inner join picture_tags using(tag_id) where tag='bla'
> select tag from tags inner join picture_tags using(tag_id) where pid='xxx'
>
> * A full text index in the picture table containing the tags
>
> select pid from pictures where tags @@ to_tsquery('bla')
> (or the non-fti version)
> select pid from pictures where tags ~* '.*bla.*'
>
> select tags from pictures where pid=xxx;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kalle Hallivuori 2007-07-19 07:23:38 Re: improvement suggestions for performance design
Previous Message Mark Lewis 2007-07-18 21:51:52 Re: Optmal tags design?