From: | Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com> |
---|---|
To: | Mike Christensen <mike(at)kitchenpc(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Indexing queries with bit masks |
Date: | 2010-05-01 20:25:35 |
Message-ID: | r2x92869e661005011325z99373ab8v11dc66182fa7a99@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2010/4/30 Mike Christensen <mike(at)kitchenpc(dot)com>:
> Ok I've been blatantly lying, err, purposely simplifying the problem for the
> sake of the original email :)
>
> I've read over the responses, and am actually now considering just not using
> any index at all. Here's why:
>
> First, this actually isn't the only thing on the WHERE clause. It will only
> query for users who are "friends" with you so it can notify them of your
> activities. That's done via a weird JOIN on a table that holds all the
> friend relationships. So in reality, it will only load maybe a hundred
> rows, or maybe a thousand every once in a while if you're way popular. If
> I'm not mistaken, it should use the index to narrow it down to the list of
> friends, and then use a sequential scan to weed out the ones who subscribe
> to that type of notification.
>
> Second, the only thing /ever/ that will do this query is the queue service
> whose job it is to process notifications (which are files dropped on the
> file system) and email people all day long. This service handles one job at
> a time, and could potentially run on its own machine with its own read-only
> copy of the database. Thus, even if it was a fairly slow query, it's not
> gonna bring down the rest of the site.
>
> Regarding the idea of putting an index on each bit, I thought about this
> earlier as well as just kinda cringed. The users table gets updated quite a
> bit (last logon, session id, any time they change their profile info,
> etc).. Too many indexes is bad. I could just put the data in another table
> of course, which lead me to another idea. Have a table called Subscriptions
> and have each row hold a user id and a notification type. I could index
> both, and join on (Subscriptions.UserId = Users.UserId AND
> Subscriptions.Type = 8). This would be pretty dang fast, however updates
> are kinda a royal pain. When the user changes which types of subscriptions
> they want (via a list of checkboxes), I'd have to figure out which rows to
> delete and which new ones to insert. However, I think I have an idea in
> mind for a PgSQL function you pass in the bitmask to and then it
> "translates" it to conditional deletes and inserts.
>
> A third idea I'm tossing around is just not worry about it. Put the bitmask
> in the DB, but not filter on it. Every "friend" would be loaded into the
> dataset, but the queue processor would just "skip" rows if they didn't
> subscribe to that event. In other words, move the problem down to the
> business layer. The drawback is potentially large number of rows are
> loaded, serialized, etc into memory that will just be ignored. But of
> course the DB is probably a read-only copy and it's not even close to the
> bottle neck of the email queue under heavy load, so it's probably a
> non-issue. If mailing is slow, I just add more queue services..
>
> I'm exploring all these ideas. I predict using the bitwise AND on the where
> clause isn't gonna be the worst design ever, and it's sure easier to
> implement than a table of subscriptions. What do you guys think?
I would say "normalize". Which means that I like your "separate table"
idea best.
It's clear, obvious, and 3NF - conforming solution.
Changing the set of subscriptions with delete-update-insert combo is
not so bad as you would think.
Encapsulating it in some kind of functional API looks nice too.
Filip
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2010-05-01 20:52:25 | Re: Native DB replication for PG |
Previous Message | John R Pierce | 2010-05-01 20:23:38 | Re: PostgreSQL vs. Microsoft SQL server |