Select for update, locks and transaction levels

From: "Nick Barr" <nick(dot)barr(at)webbased(dot)co(dot)uk>
To: "PostgreSQL General ML" <pgsql-general(at)postgresql(dot)org>
Subject: Select for update, locks and transaction levels
Date: 2004-02-16 17:51:38
Message-ID: 8F4A22E017460A458DB7BBAB65CA6AE502AA4F@openmanage
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I am trying to gather stats about how many times a resource in our web
app is viewed, i.e. just a COUNT. There are potentially millions of
resources within the system.

I thought of two methods:

1. An extra column in the resource table which contains a count.
a. Each time a resource is viewed an UPDATE statement is run.

UPDATE res_table SET view_count = view_count + 1 WHERE
res_id=2177526::bigint;

b. The count is just SELECTed from the resource table.
2. A separate table that contains a count using an algorithm similar
to the method presented here:

http://archives.postgresql.org/pgsql-performance/2004-01/msg00059.php

a. Each time a resource is viewed a new row is inserted with a count
of 1.
b. Each time the view count is needed, rows from the table are SUMmed
together.
c. A compression script runs regularly to group and sum the rows
together.

I personally did not like the look of 1 so I thought about using 2. The
main reason being there would be no locks that would interfere with
"updating" the view count because in fact this was just an INSERT
statement. Also vacuuming on the new table is preferred as it is
considerably thinner (i.e. less columns) than the resource table. The
second method allows me to capture more data too, such as who viewed the
resource, which resource they viewed next, but I digress :-).

Q1.Have I missed any methods?

I thought I would have a further look 2 and have some questions about
that too.

The schema for this new table is shown below.

-- SCHEMA
---------------------------------------------------------------
CREATE TABLE view_res (
res_id int8,
count int8
) WITHOUT OIDS;

CREATE INDEX view_res_res_id_idx ON view_res (res_id);
------------------------------------------------------------------------

And the compression script should reduce the following rows:

-- QUERY ---------------------------------------------------------------
db_dev=# select * from view_res where res_id=2177526::bigint;
res_id | count
---------+-------
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
(8 rows)
------------------------------------------------------------------------

to the following

-- QUERY
---------------------------------------------------------------db_dev=#
select * from view_res where res_id=2177526::bigint;
res_id | count
---------+-------
2177526 | 8
(1 rows)
------------------------------------------------------------------------

Now I must admit I have never really played around with select for
update, locks or transaction levels, hence the questions. I have looked
in the docs and think I figured out what I need to do. The following is
pseudo-code for the compression script.

------------------------------------------------------------------------
BEGIN;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT res_id, sum(count) AS res_count FROM view_res GROUP BY res_id FOR
UPDATE;

For each row
{
DELETE FROM view_res WHERE res_id=<res_id>::biignt

INSERT INTO view_res (res_id, count) VALUES (<res_id>,
<res_count>);
}

COMMIT;
------------------------------------------------------------------------

Right the questions for this method:

Q2.Will a "group by" used with a "select . for update" lock all the rows
used for the sum?
Q3.Am I right in saying freshly inserted rows will not be affected by
the delete because of the SERIALIZABLE transaction level?
Q4.Are there any other concurrency issues that I have not though of?


BTW, this is still at the planning phase so a complete redesign is
perfectly fine. Just seeing if anyone has greater experience than me at
this sort of thing.


TIA


Nick Barr


Responses

Browse pgsql-general by date

  From Date Subject
Next Message Teodor Sigaev 2004-02-16 17:53:24 Re: making tsearch2 dictionaries
Previous Message Karam Chand 2004-02-16 17:42:56 Re: PGSQL C API()