Skip site navigation (1) Skip section navigation (2)

Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
To: josh(at)agliodbs(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-28 01:59:37
Message-ID: 42E83C09.2040508@aptalaska.net (view raw or flat)
Thread:
Lists: pgsql-performance
Josh Berkus wrote:
> Matt,
> 
> 
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better.  Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
> 
> 
> Well, my first thought is that this is a pretty complicated procedure for 
> something you want to peform well.    Is all this logic really necessary?   
> How does it get done for MySQL?
> 

I'm not sure if it's all needed, in mysql they have this simple schema:

===============================================
CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) TYPE=MyISAM;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) TYPE=MyISAM;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================

Then they do this to insert the token:

INSERT INTO bayes_token (
  id,
  token,
  spam_count,
  ham_count,
  atime
) VALUES (
  ?,
  ?,
  ?,
  ?,
  ?
) ON DUPLICATE KEY
  UPDATE
    spam_count = GREATEST(spam_count + ?, 0),
    ham_count = GREATEST(ham_count + ?, 0),
    atime = GREATEST(atime, ?)

Or update the token:

UPDATE bayes_vars SET
  $token_count_update
  newest_token_age = GREATEST(newest_token_age, ?),
  oldest_token_age = LEAST(oldest_token_age, ?)
  WHERE id = ?


I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.

Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.

schu

In response to

Responses

pgsql-performance by date

Next:From: Josh BerkusDate: 2005-07-28 02:12:39
Subject: Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.
Previous:From: Tom LaneDate: 2005-07-28 00:41:05
Subject: Re: [Bizgres-general] Re: faster INSERT with possible

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group