From: | Brice André <brice(at)famille-andre(dot)be> |
---|---|
To: | Andreas Joseph Krogh <andreas(at)visena(dot)com> |
Cc: | pgsql-sql <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Optimize query for listing un-read messages |
Date: | 2014-05-04 12:06:35 |
Message-ID: | CAOBG12n+iWtKh2HBggfoErWF8h3GRVetE3gev2CFZtOoGnDoAg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Dear Andreas,
For me, putting both "LEFT OUTER JOIN" and "NOT EXISTS" is a bad idea.
As the "LEFT OUTER JOIN" will put fields of non-existing right table to
null, I would simply rewrite it :
SELECT ... FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
prop.person_id = 1
WHERE prop.is_read = TRUE
I would also ensure that an efficient index is used for the outer join. I
would probably try at least a multi-column index on (message_id, person_id)
for the property table. I would also maybe give a try to an index on
(message_id, person_id, is_read), just to see if it improves performances.
Regards,
Brice
2014-05-01 14:51 GMT+02:00 Andreas Joseph Krogh <andreas(at)visena(dot)com>:
> Hi all,
>
> I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu
>
> I have a schema where I have lots of messages and some users who might
> have read some of them. When a message is read by a user I create an entry
> i a table message_property holding the property (is_read) for that user.
>
> The schema is as follows:
>
> drop table if exists message_property;
> drop table if exists message;
> drop table if exists person;
>
> create table person(
> id serial primary key,
> username varchar not null unique
> );
>
> create table message(
> id serial primary key,
> subject varchar
> );
>
> create table message_property(
> message_id integer not null references message(id),
> person_id integer not null references person(id),
> is_read boolean not null default false,
> unique(message_id, person_id)
> );
>
> insert into person(username) values('user_' || generate_series(0, 999));
> insert into message(subject) values('Subject ' || random() ||
> generate_series(0, 999999));
> insert into message_property(message_id, person_id, is_read) select id, 1,
> true from message order by id limit 999990;
> insert into message_property(message_id, person_id, is_read) select id, 1,
> false from message order by id limit 5 offset 999990;
> analyze;
>
> So, for person 1 there are 10 unread messages, out of a total 1mill. 5 of
> those unread does not have an entry in message_property and 5 have an entry
> and is_read set to FALSE.
>
> I have the following query to list all un-read messages for person with
> id=1:
>
> SELECT
> m.id AS message_id,
> prop.person_id,
> coalesce(prop.is_read, FALSE) AS is_read,
> m.subject
> FROM message m
> LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
> prop.person_id = 1
> WHERE 1 = 1
> AND NOT EXISTS(SELECT
> *
> FROM message_property pr
> WHERE pr.message_id = m.id AND pr.person_id =
> prop.person_id AND prop.is_read = TRUE)
> ;
>
> The problem is that it's not quite efficient and performs badly, explain
> analyze shows:
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Anti Join (cost=1.27..148784.09 rows=5 width=40) (actual
> time=918.906..918.913 rows=10 loops=1)
> Merge Cond: (m.id = pr.message_id)
> Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
> Rows Removed by Join Filter: 5
> -> Merge Left Join (cost=0.85..90300.76 rows=1000000 width=40)
> (actual time=0.040..530.748 rows=1000000 loops=1)
> Merge Cond: (m.id = prop.message_id)
> -> Index Scan using message_pkey on message m
> (cost=0.42..34317.43 rows=1000000 width=35) (actual time=0.014..115.829
> rows=1000000 loops=1)
> -> Index Scan using message_property_message_id_person_id_key on
> message_property prop (cost=0.42..40983.40 rows=999995 width=9) (actual
> time=0.020..130.728 rows=999995 loops=1)
> Index Cond: (person_id = 1)
> -> Index Only Scan using message_property_message_id_person_id_key on
> message_property pr (cost=0.42..40983.40 rows=999995 width=8) (actual
> time=0.024..140.349 rows=999995 loops=1)
> Index Cond: (person_id = 1)
> Heap Fetches: 999995
> Total runtime: 918.975 ms
> (13 rows)
>
>
> Does anyone have suggestions on how to optimize the query or schema? It's
> important that any message not having an entry in message_property for a
> user is considered un-read.
>
> Thanks!
>
> --
> *Andreas Jospeh Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas(at)visena(dot)com
> www.visena.com
> <https://www.visena.com>
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Joseph Krogh | 2014-05-04 16:40:15 | Re: Optimize query for listing un-read messages |
Previous Message | Andreas Joseph Krogh | 2014-05-01 12:51:10 | Optimize query for listing un-read messages |