Re: making a pg store of 'multiple checkboxes' efficient

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Jonathan Vanasco <postgres(at)2xlp(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: making a pg store of 'multiple checkboxes' efficient
Date: 2007-04-18 08:28:33
Message-ID: 4625D6B1.9000809@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jonathan Vanasco wrote:
> I have a large table (5M items current) that is projected to grow at the
> rate of 2.5M a month looking at the current usage trends.
>
> the table represents some core standardized user account attributes ,
> while text heavy / unstandardized info lies in other tables.
>
> my issue is this: i'm adding in a 'multiple checkboxes' style field, and
> trying to weigh the options for db representation against one another.
>
> my main concern is speed - this is read heavy , but I am worried to some
> degree about disk space (not sure where disk space fits in with pg, when
> I used to use mysql the simplest schema change could drastically effect
> the disk size though ).
>
> that said , these are my current choices:
>
> option a
> bitwise operations
> and/or operations to condense checkboxes into searchable field
> pro:
> super small
> fits in 1 table
> con:
> could not find any docs on the speed of bitwise searches
> in pg

I got some good results using bitwise operations on an integer column.
An index on such a column helps a great deal of course. What type of
integer you need depends on how many booleans you (expect to) have.

My operations were like "WHERE (value & 80) = 80" to match against the
7th and 5th bits, with value an int4 column. Query times are in the tens
of milliseconds range.

Admittedly I only got about that many records through joins with other
tables I needed (meaning that my result set may at some point in time
have been around as big as your data set), so the planner may have
reduced the number of bitwise operations significantly for me.

The actual number of bitwise values to compare was around 40,000
integers * 25 mask values, but that got joined with over 1M records from
another result set.

A bitwise operator on a single column should (theoretically) have less
overhead than integer/boolean operators on multiple columns. Computers
are good at bitwise operations, after all.

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sorin N. Ciolofan 2007-04-18 13:48:29 Re: [GENERAL] Increasing the shared memory
Previous Message Jonathan Vanasco 2007-04-18 07:39:29 making a pg store of 'multiple checkboxes' efficient