Thanks for your advice !
More stuff below...
>> Now, I've been facing a related problem with tracking user sessions
>> for a
>> web app. I want to use a table to store user sessions, both active
>> sessions and expired sessions for archiving. I also wanted it to look
>> two different tables. I could have created one table with two views
>> (online and archived), or two tables.
>> In the end I went with two tables because the online session table is
>> read and updated very often, so it better be small and fit in the cache,
>> while the archive table will probably be huge and not used often. So to
>> keep better locality of reference I used two tables, and I created
>> functions to create sessions, update a session to push its timeout
>> value a
>> bit in the future, and close a session. These functions detect timed-out
>> sessions in the "online" table and move them to the "archive" table. I
>> also have a cleanup function which moves expired sessions to the archive
>> table and which will be called by a cron.
> may be better to have it stored seperately. Note that you will still
> have locality of reference even if it's one big table, since recently
Yes, new items are appended at the end, so it should be okay. I could use
a partial index (unique index on online sessions only) to find data fast
in that table...
> (3) Explain your query like "EXPLAIN ANALYZE SELECT * FROM session WHERE
> (4) If it does a sequential scan, that means the table is small enough
> to get the entire table in few disk reads. If it does an index scan.....
OK, this would mean "huge" is pretty small (like a few hundreds records),
I intend to have a lot more of them (in the archive), so it'll be indexes
> Now, there are a couple other considerations that you might have to
> answer for yourself.
> (1) how often do you move expired sessions to the session_archive table?
- This is a web app so there is no "close session" event, it comes simply
from reaching a timeout. The timeout is stored as a session expiry
timestamp. Thus online sessions have this timestamp>now(). I set the
timeout to 30 minutes.
- When a page is requested, I have to prolong the timeout. This could
lead to a lot of updates.
Thus I have only one function which creates/updates a session data :
- it takes a user ID and session info (like IP address etc).
- it looks in the table to see if there's a session (SELECT on a unique
- if there's a non-expired session
- and it won't expire in the next 10 minutes, we do nothing.
- if it will expire in the next 10 minutes, we spend time UPDATEing it
- if there's an expired session
- insert the record into the archive, delete it from this table
- if there's an expired session or no session at all
- INSERT a new row
Thus most of the time this function does a SELECT and then exits. Moving
sessions between tables is a small oiverhead as it happens only on logout
vs on every page view.
> (2) how often do you vacuum the session table?
The archive, only when I'll DELETE very old records from it.
The online sessions table, probably often, but it'll be very small.
This is another advantage of using two tables.
> (3) do you want an all_sessions view like "CREATE VIEW all_sessions AS
> SELECT * FROM session UNION SELECT * FROM session_archive"?
> Keep in mind the extra cpu and disk activity from constantly moving the
> records to archive, and constantly vacuuming. How often you do those
> things probably requires some real-world testing.
As I said, the consantly vacuumed table will be quite small.
All in all, I'm extremely satisfied with Postgresql
and keep discovering very nice stuff in this program.
I was on MySQL before, my god, never again !
In response to
pgsql-general by date
|Next:||From: Tony Reina||Date: 2004-07-29 06:49:17|
|Subject: Re: Win32 binary|
|Previous:||From: Oleg Bartunov||Date: 2004-07-29 04:27:47|
|Subject: Re: Tsearch2 dump/reload problem|