Re: VACUUM FULL

From: "Rajesh Kumar Mallah(dot)" <mallah(at)trade-india(dot)com>
To: Brian McCane <bmccane(at)mccons(dot)net>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: VACUUM FULL
Date: 2002-05-12 05:24:06
Message-ID: 200205121054.06218.mallah@trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Brian,

Yes contrib/tsearch does not provide relevence ranking at the
moment,

But OpenFTS-perl/tcl (openfts.sf.net)
(from which tsearch is derieved) do provide
relevence ranking and those are also very fast.

and they work in similar fashion as you have metioned
in your reply,

also OpenFTS does stemming in you search phrases.

but i understand applicability of those solutions
are also subject to your current requirements.

regds
mallah.

On Saturday 11 May 2002 10:23 pm, Brian McCane wrote:
> I have tried contrib/tsearch, but I do fairly complex score computation
> based on word placement (headers,body,graphics,metatags, etc). And also
> do additional computations on the scores when the user does a search. It
> is very quick to use something like tsearch to figure out which pages
> match my criterion, but very slow to then compute a valid score for each
> page. Instead, in an attempt to speed things up, a score is computed for
> each distinct word on a page, and that score is stored in this table.
> Then when a search is requested, I use previously existing search
> information and data from this table to compute the new results.
>
> The upshot of all this computing is that I get fairly relevant results,
> but I have to do a lot of math to get there. I have been considering
> combining the two methods, doing something like a full text search to find
> pages that meet my criterion, and then using this table to actually
> compute a pages score.
>
> - brian
>
> On Sat, 11 May 2002, Rajesh Kumar Mallah. wrote:
> > Hi Brian ,
> >
> > are you performing full text search in any case?
> >
> > Apart from optimizing the TABLE/INDEXES (thru VACUUM measures)
> >
> > i feel using text indexes provided by contrib/tsearch can also
> > lead to significant improvement in search performance.
> >
> > regds
> > mallah.
> >
> > On Saturday 11 May 2002 05:48 am, Brian McCane wrote:
> > > Okay, I guess I misunderstood something about "VACUUM FULL". My
> > > understanding was that a VACUUM (without FULL), marked unused records
> > > for reuse. VACUUM FULL moved records from the end of a file to the
> > > holes where unused records were stored and truncated the file to free
> > > disk space. So far so good, but....
> > >
> > > I have had continued loss of performance on one of my largest tables
> > > (600,000,000 records). There are 4 fields in a record like such:
> > >
> > > CREATE TABLE foo (
> > > a int not null references bar(a) on delete cascade on update no
> > > action, b int not null references baz(b) on delete cascade on update no
> > > action, c int,
> > > d smallint,
> > > primary key(a, b)) ;
> > > CREATE INDEX foo_ac ON foo (a,c) ;
> > > CREATE INDEX foo_ad on foo (a,d) ;
> > > And there are 3 triggers which fire before insert/delete/update.
> > >
> > > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this
> > > number is increasing on a daily basis as I make changes which improve
> > > the performance of my data gathering tools (spiders ;). Two days ago,
> > > it had reached the point where a search for a 3-word term (ie. free
> > > news servers) took about 60 seconds. I have just spent 48 hours
> > > running a VACUUM FULL on my table, and now the same search takes < 10
> > > seconds. I assume that the increase in performance is due to the
> > > decrease in table/index size which added up to approximate 1GB of freed
> > > space on the machine, which was approximately 4% of the original size
> > > of the table and all its indices. But, a 4% decrease in size should not
> > > add up to a 84% increase in performance (is that right? I always get
> > > the ratio confused :).
> > >
> > > If all that VACUUM FULL did was move records from file 12345678.6 to
> > > file 12345678, the database would still being doing a large number of
> > > random accesses on the table. However, if VACUUM FULL clusters the
> > > data according to the primary key, it would still be doing a large
> > > number of random access on the table, because the primary key has
> > > almost nothing to do with how I actually access the data in real life.
> > > So, is VACUUM FULL looking somewhere in pg_statistics (or
> > > pg_stat_user_indexes), to determine which index I actually use most
> > > (foo_ad), and then clustering the data that way, or is there some other
> > > agent at work here.
> > >
> > > - brian
> > >
> > >
> > > Wm. Brian McCane | Life is full of doors that won't
> > > open Search http://recall.maxbaud.net/ | when you knock, equally
> > > spaced amid those Usenet http://freenews.maxbaud.net/ | that open when
> > > you don't want them to. Auction http://www.sellit-here.com/ | - Roger
> > > Zelazny "Blood of Amber"
> > >
> > >
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Gaetano Mendola 2002-05-12 15:10:56 NOTICE: RegisterSharedInvalid: SI buffer overflow
Previous Message Brian McCane 2002-05-11 22:39:40 Re: VACUUM FULL