Re: to pickle or not to pickle

From: Lincoln Yeoh <lylyeoh(at)mecomb(dot)com>
To: Marc Tardif <intmktg(at)CAM(dot)ORG>, pgsql-general(at)postgresql(dot)org
Subject: Re: to pickle or not to pickle
Date: 2000-06-05 08:28:41
Message-ID: 3.0.5.32.20000605162841.00864100@pop.mecomb.po.my
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At 11:56 AM 31-05-2000 -0400, Marc Tardif wrote:
>I'm writing a search engine using python and postgresql which requires to
>store a temporary list of results in an sql table for each request. This
>list will contain at least 50 records and could grow to about 300. My
>options are either to pickle the list and store a single entry or use the
>postgresql COPY command (as opposed to INSERT which would be too slow) to
>store each of the temporary records.

Are you trying to do:

"showing 20 results" click next/previous for next/previous 20.

Whatever it is, I don't think you should use COPY.

The way I did it was to just do the query again, and only display the
relevant results, using offset and window values.

Not as efficient, but:
1) I wanted to know how many rows there were- so if I used SELECT .. LIMIT,
I'd have to do a SELECT count first, but AFAIK, Postgresql has not special
optimizations for SELECT count (not even sure if other databases would be
faster for _my_ SELECT count).

2) I didn't want to deal with cleaning up the cache/pickles... My app was
web based, so I don't know when the users have left. Say I expire the
cache/pickles after 15 minutes. If I have 100 searches per minute, I'd end
up having 1500 pickles at a time 8*). Not really a big problem nowadays,
but I didn't think it was worth dealing with.

3) It wasn't really a search engine- different results for different users,
different ways of sorting stuff etc.

But if your search engine returns the same result given the same query no
matter who the user is, the cache thing could be good. May mean a redesign-
have a cache table storing queries and results (and expiry). You will
probably require regular vacuuming, since the cache table will be changing
quite often.

e.g. each row:
query string, result1,result2, sequence, total results, expiry time.

By storing the total results you can use Postgresql's LIMIT feature more
intelligently. You can probably afford to waste the 4 bytes per row, and
keep everything in one table for speed.

Cheerio,

Link.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marcin Inkielman 2000-06-05 08:53:19 Re: index problem
Previous Message Lincoln Yeoh 2000-06-05 08:01:35 Re: query optimiser changes 6.5->7.0