RE: SQL schema and query optimisation for fast cross-table query execution

From: "Mike Sofen" <msofen(at)runbox(dot)com>
To: 'Brice André' <brice(at)famille-andre(dot)be>, <pgsql-sql(at)lists(dot)postgresql(dot)org>
Subject: RE: SQL schema and query optimisation for fast cross-table query execution
Date: 2020-01-25 13:18:42
Message-ID: 0f9f01d5d381$f8216e00$e8644a00$@runbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

From: Brice André <brice(at)famille-andre(dot)be> Sent: Saturday, January 25, 2020 12:49 AM
wrote:

The current DB schema is as follows:
user table : this table records all users registered on the web-site
- user_id
- ...
challenge table: this table records all ongoing challenges
- challenge_id
- ...
user_challenge_association table : this table records which user is registered to which challenge
- user_id
- challenge_id
activity table : this table records all activities of the users
- user_id
- date
- ...

a naive implementation of my query could be something like this:

SELECT * FROM activity WHERE user_id IN (SELECT user_id FROM user_challenge_association WHERE challenge_id IN (SELECT challenge_id FROM user_challenge_association WHERE user_id = ||current_user||)) ORDER BY date

Brice

---------------------------------------------------------------

Brice, you are correct, “IN” clauses are horrific performers because they turn every member of the IN into an OR, so the larger your IN list, the slower your query runs.

But every IN can be replaced with a join, and that leverages the power of the relational engine. With standard indexes on the primary and foreign keys, this will be extremely fast and scalable to many hundreds of millions of rows.

Taking your query and rewriting it with joins:

SELECT u.user_name, c.challenge_name, a.activity_name, a.activity_date

FROM activity a

Join user_challenge_association uca on (user_id)

Join challenge c on (challenge_id)

Join user u on (user_id)

Where u.user_id = ||current_user||)) ORDER BY date

Mike Sofen

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Viral Shah 2020-02-25 15:11:32 pg_dump fails when a table is in ACCESS SHARE MODE
Previous Message Brice André 2020-01-25 08:49:28 SQL schema and query optimisation for fast cross-table query execution