Re: Indexing queries with bit masks

From: Mike Christensen <mike(at)kitchenpc(dot)com>
To: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Indexing queries with bit masks
Date: 2010-05-02 11:59:25
Message-ID: r2o7aa638e01005020459w375fc567p9b7b3f80bc6cb595@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey thanks.. I thought I'd share the method I came up with for updating
subscriptions. Basically, as far as my code is concerned the DB uses a
bitmask (at least for updates) but I abstract it through a function. First
off, I have a little helper function so I don't repeat the same code a bunch
of times:

CREATE OR REPLACE FUNCTION KPC_UpdateEmailPreferenceHelper(_enable boolean,
_userid uuid, _type smallint)
RETURNS void AS
$BODY$
BEGIN
IF _enable THEN
INSERT INTO EmailPreferences (UserId, NotificationType) SELECT
_userid, _type
WHERE NOT EXISTS (SELECT 1 FROM EmailPreferences WHERE UserId =
_userid AND NotificationType = _type);
ELSE
DELETE FROM EmailPreferences WHERE UserId = _userid AND
NotificationType = _type;
END IF;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

Then through my code I call this one with a bitmask of which notifications
the user wants to subscribe to:

CREATE OR REPLACE FUNCTION KPC_UpdateEmailPreference(_userid uuid, _prefs
smallint)
RETURNS void AS
$BODY$
BEGIN
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 1 > 0, _userid,
1::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 2 > 0, _userid,
2::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 4 > 0, _userid,
3::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 8 > 0, _userid,
4::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 16 > 0, _userid,
5::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 32 > 0, _userid,
6::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 64 > 0, _userid,
7::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 128 > 0, _userid,
8::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 256 > 0, _userid,
9::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 512 > 0, _userid,
10::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 1024 > 0,
_userid, 11::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 2048 > 0,
_userid, 12::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 4096 > 0,
_userid, 13::smallint);
PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 8192 > 0,
_userid, 14::smallint);
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

Seems to work pretty well, anyone have any feedback?

Mike

2010/5/1 Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>

> 2010/4/30 Mike Christensen <mike(at)kitchenpc(dot)com>:
> > Ok I've been blatantly lying, err, purposely simplifying the problem for
> the
> > sake of the original email :)
> >
> > I've read over the responses, and am actually now considering just not
> using
> > any index at all. Here's why:
> >
> > First, this actually isn't the only thing on the WHERE clause. It will
> only
> > query for users who are "friends" with you so it can notify them of your
> > activities. That's done via a weird JOIN on a table that holds all the
> > friend relationships. So in reality, it will only load maybe a hundred
> > rows, or maybe a thousand every once in a while if you're way popular.
> If
> > I'm not mistaken, it should use the index to narrow it down to the list
> of
> > friends, and then use a sequential scan to weed out the ones who
> subscribe
> > to that type of notification.
> >
> > Second, the only thing /ever/ that will do this query is the queue
> service
> > whose job it is to process notifications (which are files dropped on the
> > file system) and email people all day long. This service handles one job
> at
> > a time, and could potentially run on its own machine with its own
> read-only
> > copy of the database. Thus, even if it was a fairly slow query, it's not
> > gonna bring down the rest of the site.
> >
> > Regarding the idea of putting an index on each bit, I thought about this
> > earlier as well as just kinda cringed. The users table gets updated
> quite a
> > bit (last logon, session id, any time they change their profile info,
> > etc).. Too many indexes is bad. I could just put the data in another
> table
> > of course, which lead me to another idea. Have a table called
> Subscriptions
> > and have each row hold a user id and a notification type. I could index
> > both, and join on (Subscriptions.UserId = Users.UserId AND
> > Subscriptions.Type = 8). This would be pretty dang fast, however updates
> > are kinda a royal pain. When the user changes which types of
> subscriptions
> > they want (via a list of checkboxes), I'd have to figure out which rows
> to
> > delete and which new ones to insert. However, I think I have an idea in
> > mind for a PgSQL function you pass in the bitmask to and then it
> > "translates" it to conditional deletes and inserts.
> >
> > A third idea I'm tossing around is just not worry about it. Put the
> bitmask
> > in the DB, but not filter on it. Every "friend" would be loaded into the
> > dataset, but the queue processor would just "skip" rows if they didn't
> > subscribe to that event. In other words, move the problem down to the
> > business layer. The drawback is potentially large number of rows are
> > loaded, serialized, etc into memory that will just be ignored. But of
> > course the DB is probably a read-only copy and it's not even close to the
> > bottle neck of the email queue under heavy load, so it's probably a
> > non-issue. If mailing is slow, I just add more queue services..
> >
> > I'm exploring all these ideas. I predict using the bitwise AND on the
> where
> > clause isn't gonna be the worst design ever, and it's sure easier to
> > implement than a table of subscriptions. What do you guys think?
>
> I would say "normalize". Which means that I like your "separate table"
> idea best.
> It's clear, obvious, and 3NF - conforming solution.
> Changing the set of subscriptions with delete-update-insert combo is
> not so bad as you would think.
> Encapsulating it in some kind of functional API looks nice too.
>
> Filip
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John Gage 2010-05-02 12:13:52 Order of execution in shell echo to psql
Previous Message AI Rumman 2010-05-02 10:33:25 int2vector