From: | "Bryan Murphy" <bryan(dot)murphy(at)gmail(dot)com> |
---|---|
To: | "lists(at)on-track(dot)ca" <lists(at)on-track(dot)ca> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Optmal tags design? |
Date: | 2007-07-18 21:54:07 |
Message-ID: | bd8531800707181454xb968a48ya3c0efc160d104a4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
We store tags on our items like this like this:
Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE
Item.ID INT NOT NULL PRIMARY KEY
ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE
with appropriate indexes on the columns we need to frequently query.
We have about 3 million tag bindings right now, and have not run into any
performance issues related to tagging other than generating tag clouds
(which we pre-calculate anyway).
I'll have to get back to you when we get up to 10's, or even 100's of
millions and let you know how it scaled.
Bryan
On 7/18/07, lists(at)on-track(dot)ca <lists(at)on-track(dot)ca> wrote:
>
> I am planning to add a tags (as in the "web 2.0" thing) feature to my web
> based application. I would like some feedback from the experts here on
> what the best database design for that would be.
>
> The possibilities I have come up with are:
> * A tags table containing the tag and id number of what it links to.
> select pid from tags where tag='bla'
> select tag from tags where pid=xxx.
>
> * a tags table where each tag exists only once, and a table with the tag
> ID and picture ID to link them together.
>
> select pid from tags inner join picture_tags using(tag_id) where tag='bla'
> select tag from tags inner join picture_tags using(tag_id) where pid='xxx'
>
> * A full text index in the picture table containing the tags
>
> select pid from pictures where tags @@ to_tsquery('bla')
> (or the non-fti version)
> select pid from pictures where tags ~* '.*bla.*'
>
> select tags from pictures where pid=xxx;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kalle Hallivuori | 2007-07-19 07:23:38 | Re: improvement suggestions for performance design |
Previous Message | Mark Lewis | 2007-07-18 21:51:52 | Re: Optmal tags design? |