Re: What's the best way in Postgres9 to store a bunch of arbitrary boolean values for a row?

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Mike Christensen *EXTERN*" <mike(at)kitchenpc(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: What's the best way in Postgres9 to store a bunch of arbitrary boolean values for a row?
Date: 2011-12-05 10:54:32
Message-ID: CAF6yO=01PUoZ_qpVyQ263cNfoOvSEykSqJOr3bjCa7ume5hv7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Le 5 décembre 2011 10:04, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> a écrit :
> Mike Christensen wrote:
>> I have a database full of recipes, one recipe per row.  I need to
>> store a bunch of arbitrary "flags" for each recipe to mark various
>> properties such as Gluton-Free, No meat, No Red Meat, No Pork, No
>> Animals, Quick, Easy, Low Fat, Low Sugar, Low Calorie, Low Sodium and
>> Low Carb.  Users need to be able to search for recipes that contain
>> one or more of those flags by checking checkboxes in the UI.
>>
>> I'm searching for the best way to store these properties in the
>> Recipes table.  My ideas so far:
>>
>> 1. Have a separate column for each property and create an index on
>> each of those columns.  I may have upwards of about 20 of these
>> properties, so I'm wondering if there's any drawbacks with creating a
>> whole bunch of BOOL columns on a single table.
>> 2. Use a bitmask for all properties and store the whole thing in one
>> numeric column that contains the appropriate number of bits.  Create a
>> separate index on each bit so searches will be fast.
>> 3. Create an ENUM with a value for each tag, then create a column that
>> has an ARRAY of that ENUM type.  I believe an ANY clause on an array
>> column can use an INDEX, but have never done this.
>> 4. Create a separate table that has a one-to-many mapping of recipes
>> to tags.  Each tag would be a row in this table.  The table would
>> contain a link to the recipe, and an ENUM value for which tag is "on"
>> for that recipe.  When querying, I'd have to do a nested SELECT to
>> filter out recipes that didn't contain at least one of these tags.  I
>> think this is the more "normal" way of doing this, but it does make
>> certain queries more complicated - If I want to query for 100 recipes
>> and also display all their tags, I'd have to use an INNER JOIN and
>> consolidate the rows, or use a nested SELECT and aggregate on the fly.
>>
>> Write performance is not too big of an issue here since recipes are
>> added by a backend process, and search speed is critical (there might
>> be a few hundred thousand recipes eventually).  I doubt I will add new
>> tags all that often, but I want it to be at least possible to do
>> without major headaches.
>
> I would use a boolean column per property and a partial index on the
> ones
> where the property is selective, i.e. only a small percentage of all
> recipes
> match the property.

I would like to recommend to have a look at Bloom Filtering:
http://www.sai.msu.su/~megera/wiki/bloom

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Frank Lanitz 2011-12-05 11:16:29 pg_standby: How to check in which state the server is currently?
Previous Message Filip Rembiałkowski 2011-12-05 10:48:44 Re: What's the best way in Postgres9 to store a bunch of arbitrary boolean values for a row?