SQL schema and query optimisation for fast cross-table query execution

From: Brice André <brice(at)famille-andre(dot)be>
To: pgsql-sql(at)lists(dot)postgresql(dot)org
Subject: SQL schema and query optimisation for fast cross-table query execution
Date: 2020-01-25 08:49:28
Message-ID: CAOBG12=LHSmZGkOqzRDgomjXQ73u-OazaqbD-RpuDkB-TkyN8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Dear community,

We are developing a social web site for athletes, and I am facing a
technical question on how to implement a complex query on an efficient way.

The current DB schema can be adapted, if it may help...

The concept is the following : we have different users registered on the
web-site, and we have several challenges ongoing on the site. Each user can
be registered to one or several challenges. Each user performs sport
activities. Those activities are not linked to a challenge : they count to
all challenges to which the user is registered.

The problem is the following : we would want to add on the website a news
feed panel where the users connected on the website can see the most recent
activities of the other users. But the constraint is the following : a user
can only see the activities of another user if they are both registered to
at least one common challenge.

On a performance point of view, this query should be very fast : we plan to
refresh automatically the news feed panel of each registered user very
often..

The current DB schema is as follows:

user table : this table records all users registered on the web-site
- user_id
- ...

challenge table: this table records all ongoing challenges
- challenge_id
- ...

user_challenge_association table : this table records which user is
registered to which challenge
- user_id
- challenge_id

activity table : this table records all activities of the users
- user_id
- date
- ...

a naive implementation of my query could be something like this:

SELECT * FROM activity WHERE user_id IN (SELECT user_id FROM
user_challenge_association WHERE challenge_id IN (SELECT challenge_id FROM
user_challenge_association WHERE user_id = ||current_user||)) ORDER BY date

But I really fear whatever index I put on the tables, the two nested IN
operators will have a huge performance impact on the query...

As there will be fare more queries of this type than queries that add or
modify activities, I was guessing if there was not a possibility to
pre-compute something at activity insertion so that we can rewrite the
query on a more efficient way...

Or maybe a special index that would be cross the activity and the
user_challenge_association tables ?

Any help would be very welcome,

Many thanks in advance,

Brice

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Mike Sofen 2020-01-25 13:18:42 RE: SQL schema and query optimisation for fast cross-table query execution
Previous Message Erik Brandsberg 2020-01-06 13:53:26 Re: For each statement trigger and update table