Re: Suspending SELECTs

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: Alessandro Baretta <a(dot)baretta(at)barettadeit(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Suspending SELECTs
Date: 2006-01-18 03:43:54
Message-ID: 43CDB97A.8050702@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Alessandro Baretta wrote:
>> I think you're trying to do something at the wrong layer of your
>> architecture. This task normally goes in your middleware layer, not
>> your database layer.
>
> I am developing my applications in Objective Caml, and I have written
> the middleware layer myself. I could easily implement a cursor-pooling
> strategy...

You're trying to solve a very hard problem, and you're rewriting a lot of stuff that's been worked on for years by teams of people. If there's any way you switch use something like JBOSS, it might save you a lot of grief and hard work.

I eliminated this problem a different way, using what we call a "hitlist". Basically, every query becomes a "select into", something like this:

insert into hitlist_xxxx (select id from ...)

where "xxxx" is your user's id. Once you do this, it's trivial to return each page to the user almost instantly using offset/limit, or by adding a "ROW_NUM" column of some sort. We manage very large hitlists -- millions of rows. Going from page 1 to page 100,000 takes a fraction of a second.

It also has the advantage that the user can come back in a week or a month and the results are still there.

The drawback are:

1. Before the user gets the first page, the entire query must complete.
2. You need a way to clean up old hitlists.
3. If you have tens of thousands of users, you'll have a large number of hitlists, and you have to use tablespaces to ensure that Linux filesystem directories don't get too large.
4. It takes space to store everyone's data. (But disk space is so cheap this isn't much of an issue.)

You can eliminate #3 by a single shared hitlist with a column of UserID's. But experience shows that a big shared hitlist doesn't work very well: Inserts get slower because the UserID column must be indexed, and you can truncate individual hitlists but you have to delete from a shared hitlist.

Craig

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-01-18 04:02:21 Re: Suspending SELECTs
Previous Message Hari Warrier 2006-01-18 03:29:44 Getting pg to use index on an inherited table (8.1.1)