Whether to use "IN" clause

From: "Mark Williams" <markwillimas(at)gmail(dot)com>
To: <pgsql-sql(at)lists(dot)postgresql(dot)org>
Subject: Whether to use "IN" clause
Date: 2019-12-31 10:15:58
Message-ID: 002901d5bfc3$4c05e3c0$e411ab40$@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a table (DOCS_TABLE) which can vary greatly in size.

DOCS_TABLE has a timestamp field (LAST_UPDATED). As the name suggests this
records the date on which data in the record last changed.

When user's open an app if they haven't queried DOCS_TABLE previously it is
loaded via the server using a fairly complicated WHERE statement
(COMPLICATED_WHERE_STATEMENT) to establish which items from the table the
user is permitted to access.

When the user closes the app the data from DOCS_TABLE is stored locally
along with a timestamp to record the date and time the data was last
refreshed (STORED_TMESTAMP). Next time the app opens it loads the data from
the locally stored file.

It then needs to ensure the user is working with up-to-date data. At the
moment I am running a refresh query SELECT [fields] FROM DOCS_TABLE WHERE
LAST_UPDATED >[STORED_TIMESTAMP] AND [COMPLICATED_WHERE_STATEMENT].

I use the resulting data from the refresh query to update the data displayed
to the user.

This works, although one obvious problem is that it does not deal with any
rows from DOCS_TABLE which were available to the user when the data was
stored locally, but have since been deleted or access denied. As such,
within the app, I run a check to make sure the user still has access to that
row before trying to do anything with it, but that does not strike me as a
particularly elegant solution. It would be better if such items could be
removed from the data displayed to the user.

The method I am contemplating and on which I would appreciate feedback is as
follows:

1. Load the data from the local file.
2. Run a thread for the following.
3. Run a query (ID_QUERY) to ascertain which rows are now available to
the user: SELECT id FROM DOCS_TABLE WHERE [COMPLICATED_WHERE_STATEMENT].
4. Check the locally saved data against the result of this query to see
what rows are no longer available to the user and remove them.
5. Build a list of ids from the locally saved data (EXISTING_ID_ARRAY).
6. Check the locally saved data against the results from ID_QUERY to
see whether there are any new records to be added and build a list of the
ids (NEW_ID_ARRAY).
7. Run the refresh query using the arrays: SELECT [fields] FROM
DOCS_TABLE WHERE (id in ([NEW_ID_ARRAY])) OR (id in [EXISTING_ID_ARRAY] AND
LAST_UPDATED >[STORED_TIMESTAMP]).

Unless there is a completely different proposal as to how to approach this
refresh, I am happy that use of NEW_ID_ARRAY is the way to go.

But is it sensible to use EXISTING_ID_ARRAY? It will cut out the use of the
COMPLICATED_WHERE_STATEMENT and enable the query to focus explicitly on a
group of records clearly identified. However, the size of the array, could
become quite large. In this respect, I suppose my question is whether there
is a law of diminishing returns with such an IN clause. For example, if
there were 1M records in the table and 20 items in the array, using the
array must evidently be much quicker. But what if the array contained 800K
items? I assume that it will then probably be significantly less efficient
to use EXISTING_ID_ARRAY and more efficient to use
COMPLICATED_WHERE_STATEMENT.

I appreciate without providing full details of the structure of DOCS_TABLE,
the data being retrieved from it and the full nature of the
COMPLICATED_WHERE_STATEMENT, I am asking for a comparison between apples and
pears. What I am really interested in is whether my logic set out above is
sound or idiotic and any suggestions on how best to achieve what I am trying
to achieve.

Many thanks and Happy New Year!

Regards,

Mark

__

Browse pgsql-sql by date

  From Date Subject
Next Message Mike Martin 2020-01-03 23:09:23 For each statement trigger and update table
Previous Message ROS Didier 2019-12-27 08:56:55 RE: problem with read-only user