Re: Performance problems testing with Spamassassin 3.1.0

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-30 18:28:53
Message-ID: 27897.1122748133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net> writes:
> One thing that is still very puzzling to me is why this runs so much
> slower when I put the data.sql in a transaction. Obviously transactions
> are acting different when you call a proc a zillion times vs an insert
> query.

I looked into this a bit. It seems that the problem when you wrap the
entire insertion series into one transaction is associated with the fact
that the test does so many successive updates of the single row in
bayes_vars. (VACUUM VERBOSE at the end of the test shows it cleaning up
49383 dead versions of the one row.) This is bad enough when it's in
separate transactions, but when it's in one transaction, none of those
dead row versions can be marked "fully dead" yet --- so for every update
of the row, the unique-key check has to visit every dead version to make
sure it's dead in the context of the current transaction. This makes
the process O(N^2) in the number of updates per transaction. Which is
bad enough if you just want to do one transaction per message, but it's
intolerable if you try to wrap the whole bulk-load scenario into one
transaction.

I'm not sure that we can do anything to make this a lot smarter, but
in any case, the real problem is to not do quite so many updates of
bayes_vars.

How constrained are you as to the format of the SQL generated by
SpamAssassin? In particular, could you convert the commands generated
for a single message into a single statement? I experimented with
passing all the tokens for a given message as a single bytea array,
as in the attached, and got almost a factor of 4 runtime reduction
on your test case.

BTW, it's possible that this is all just a startup-transient problem:
once the database has been reasonably well populated, one would expect
new tokens to be added infrequently, and so the number of updates to
bayes_vars ought to drop off.

regards, tom lane

Revised insertion procedure:

CREATE or replace FUNCTION put_tokens (_id INTEGER,
_tokens BYTEA[],
_spam_count INTEGER,
_ham_count INTEGER,
_atime INTEGER)
RETURNS VOID AS
$$
declare _token bytea;
new_tokens integer := 0;
BEGIN
for i in array_lower(_tokens,1) .. array_upper(_tokens,1)
LOOP
_token := _tokens[i];
UPDATE bayes_token
SET spam_count = spam_count + _spam_count,
ham_count = ham_count + _ham_count,
atime = _atime
WHERE id = _id
AND token = _token;

IF not found THEN
INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
_ham_count, _atime);
new_tokens := new_tokens + 1;
END IF;
END LOOP;
if new_tokens > 0 THEN
UPDATE bayes_vars SET token_count = token_count + new_tokens
WHERE id = _id;
IF NOT FOUND THEN
RAISE EXCEPTION 'unable to update token_count in bayes_vars';
END IF;
END IF;
RETURN;
END;
$$
LANGUAGE plpgsql;

Typical input:

select put_tokens(1,'{"\\125\\42\\80\\167\\166","\\38\\153\\220\\93\\190","\\68\\7\\112\\52\\224","\\51\\14\\78\\155\\49","\\73\\245\\15\\221\\43","\\96\\179\\108\\197\\121","\\123\\97\\220\\173\\247","\\55\\132\\243\\51\\65","\\238\\36\\129\\75\\181","\\145\\253\\196\\106\\90","\\119\\0\\51\\127\\236","\\229\\35\\181\\222\\3","\\163\\1\\191\\220\\79","\\232\\97\\152\\207\\26","\\111\\146\\81\\182\\250","\\47\\141\\12\\76\\45","\\252\\97\\168\\243\\222","\\24\\157\\202\\45\\24","\\230\\207\\30\\46\\115","\\106\\45\\182\\94\\136","\\45\\66\\245\\41\\103","\\108\\126\\171\\154\\210","\\64\\90\\1\\184\\145","\\242\\78\\150\\104\\213","\\214\\\\134\\7\\179\\150","\\249\\12\\247\\164\\74","\\234\\35\\93\\118\\102","\\5\\152\\152\\219\\188","\\99\\186\\172\\56\\241","\\99\\220\\62\\240\\148","\\106\\12\\199\\33\\177","\\34\\74\\190\\192\\186","\\219\\127\\145\\132\\203","\\240\\113\\128\\160\\46","\\83\\5\\239\\206\\221","\\245\\253\\219\\83\\250","\\1\\53\\126\\56\\129","\\206\\1!
30\\97\\246\\47","\\217\\57\\185\\37\\202","\\235\\10\\74\\224\\150","\\80\\151\\70\\52\\96","\\126\\49\\156\\162\\93","\\243\\120\\218\\226\\49","\\251\\132\\118\\47\\221","\\241\\160\\120\\146\\198","\\183\\32\\161\\223\\178","\\80\\205\\77\\57\\2","\\121\\231\\13\\71\\218","\\71\\143\\184\\88\\185","\\163\\96\\119\\211\\142","\\20\\143\\90\\91\\211","\\179\\228\\212\\15\\22","\\243\\35\\149\\9\\55","\\140\\149\\99\\233\\241","\\164\\246\\101\\147\\107","\\202\\70\\218\\40\\114","\\39\\36\\186\\46\\84","\\58\\116\\44\\237\\2","\\80\\204\\185\\47\\105","\\64\\227\\29\\108\\222","\\173\\115\\56\\91\\52","\\102\\39\\157\\252\\64","\\133\\9\\89\\207\\62","\\27\\2\\230\\227\\201","\\163\\45\\123\\160\\129","\\170\\131\\168\\107\\198","\\236\\253\\0\\43\\228","\\44\\255\\93\\197\\136","\\64\\122\\42\\230\\126","\\207\\222\\104\\27\\239","\\26\\240\\78\\73\\45","\\225\\107\\181\\246\\160","\\231\\72\\243\\36\\159","\\248\\60\\14\\67\\145","\\21\\161\\247\\43\\198","\\81\\243\\19!
1\\168\\18","\\237\\227\\23\\40\\140","\\60\\90\\96\\168\\201","\\211\
\107\\181\\46\\38","\\178\\129\\212\\16\\254","\\85\\177\\246\\29\\221","\\182\\123\\178\\157\\9","\\154\\159\\180\\116\\89","\\80\\136\\196\\242\\161","\\185\\110\\90\\163\\157","\\163\\191\\229\\13\\42","\\11\\119\\205\\160\\223","\\75\\216\\70\\223\\6","\\130\\48\\154\\145\\51","\\62\\104\\212\\72\\3","\\247\\105\\51\\64\\136","\\17\\96\\45\\40\\77","\\52\\1\\252\\53\\121","\\68\\195\\58\\103\\91","\\135\\131\\100\\4\\0","\\131\\129\\44\\193\\194","\\47\\234\\101\\143\\26","\\206\\\\134\\32\\154\\0","\\17\\41\\177\\34\\178","\\145\\127\\114\\231\\216","\\19\\172\\6\\39\\126","\\237\\233\\121\\43\\119","\\201\\167\\167\\67\\233","\\88\\159\\102\\50\\117","\\100\\133\\107\\190\\133","\\169\\146\\178\\120\\106"}',1,0,1088628232);
select put_tokens(1,'{"\\196\\75\\30\\153\\73","\\73\\245\\15\\221\\43","\\14\\7\\116\\254\\162","\\244\\161\\139\\59\\16","\\214\\226\\238\\196\\30","\\209\\14\\131\\231\\30","\\41\\\\134\\176\\195\\166","\\70\\206\\48\\38\\33","\\247\\131\\136\\80\\31","\\4\\85\\5\\167\\214","\\246\\106\\225\\106\\242","\\28\\0\\229\\160\\90","\\127\\209\\58\\120\\83","\\12\\52\\52\\147\\95","\\255\\115\\21\\5\\68","\\244\\152\\121\\76\\20","\\19\\128\\183\\248\\181","\\140\\91\\18\\127\\208","\\93\\9\\62\\196\\247","\\248\\200\\31\\207\\108","\\44\\216\\247\\15\\195","\\59\\189\\9\\237\\142","\\1\\14\\10\\221\\68","\\163\\155\\122\\223\\104","\\97\\5\\105\\55\\137","\\184\\211\\162\\23\\247","\\239\\249\\83\\68\\54","\\67\\207\\180\\186\\234","\\99\\78\\237\\211\\180","\\200\\11\\32\\179\\50","\\95\\105\\18\\60\\253","\\207\\102\\227\\94\\84","\\71\\143\\184\\88\\185","\\13\\181\\75\\24\\192","\\188\\241\\141\\99\\242","\\139\\124\\248\\130\\4","\\25\\110\\149\\63\\114","\\21\\162\\199\\1!
29\\199","\\164\\246\\101\\147\\107","\\198\\202\\223\\58\\197","\\181\\10\\41\\25\\130","\\71\\163\\116\\239\\170","\\46\\170\\238\\142\\89","\\176\\120\\106\\103\\228","\\39\\228\\25\\38\\170","\\114\\79\\121\\18\\222","\\178\\105\\98\\61\\39","\\90\\61\\12\\23\\135","\\176\\118\\81\\65\\66","\\55\\104\\57\\198\\150","\\206\\251\\224\\128\\41","\\29\\158\\68\\146\\164","\\248\\60\\14\\67\\145","\\210\\220\\161\\10\\254","\\72\\81\\151\\213\\68","\\25\\236\\210\\197\\128","\\72\\37\\208\\227\\54","\\242\\24\\6\\88\\26","\\128\\197\\20\\5\\211","\\98\\105\\71\\42\\180","\\91\\43\\72\\84\\104","\\205\\254\\174\\65\\141","\\222\\194\\126\\204\\164","\\233\\153\\37\\148\\226","\\32\\195\\22\\153\\87","\\194\\97\\220\\251\\18","\\151\\201\\148\\52\\147","\\205\\55\\0\\226\\58","\\172\\12\\50\\0\\140","\\56\\32\\43\\9\\45","\\18\\174\\50\\162\\126","\\138\\150\\12\\72\\189","\\49\\230\\150\\210\\48","\\2\\140\\64\\104\\32","\\14\\174\\41\\196\\121","\\100\\195\\116\\130\\101","\!
\222\\45\\94\\39\\64","\\178\\203\\221\\63\\94","\\26\\188\\157\\\\134
\\52","\\119\\0\\51\\127\\236","\\88\\32\\224\\142\\164","\\111\\146\\81\\182\\250","\\12\\177\\151\\83\\13","\\113\\27\\173\\162\\19","\\158\\216\\41\\236\\226","\\16\\88\\\\134\\180\\112","\\43\\32\\16\\77\\238","\\136\\93\\210\\172\\63","\\251\\214\\30\\40\\146","\\156\\27\\198\\60\\170","\\185\\29\\172\\30\\68","\\202\\83\\59\\228\\252","\\219\\127\\145\\132\\203","\\1\\223\\97\\229\\127","\\113\\83\\123\\167\\140","\\99\\1\\116\\56\\165","\\143\\224\\239\\1\\173","\\49\\186\\156\\51\\92","\\246\\224\\70\\245\\137","\\235\\10\\74\\224\\150","\\43\\88\\245\\14\\103","\\88\\128\\232\\142\\254","\\251\\132\\118\\47\\221","\\36\\7\\142\\234\\98","\\130\\126\\199\\170\\126","\\133\\23\\51\\253\\234","\\249\\89\\242\\87\\86","\\102\\243\\47\\193\\211","\\140\\18\\140\\164\\248","\\179\\228\\212\\15\\22","\\168\\155\\243\\169\\191","\\117\\37\\139\\241\\230","\\155\\11\\254\\171\\200","\\196\\159\\253\\223\\15","\\93\\207\\154\\106\\135","\\11\\255\\28\\123\\125","\\239\\9\\226!
\\59\\198","\\191\\204\\230\\61\\39","\\175\\204\\181\\113\\\\134","\\64\\227\\29\\108\\222","\\169\\173\\194\\83\\40","\\212\\93\\170\\169\\12","\\249\\55\\232\\182\\208","\\75\\175\\181\\248\\246","\\108\\95\\114\\215\\138","\\220\\37\\59\\207\\197","\\45\\146\\43\\76\\81","\\166\\231\\20\\9\\189","\\27\\126\\81\\92\\75","\\66\\168\\119\\100\\196","\\229\\9\\196\\165\\250","\\83\\186\\103\\184\\46","\\85\\177\\246\\29\\221","\\140\\159\\53\\211\\157","\\214\\193\\192\\217\\109","\\10\\5\\64\\97\\157","\\92\\137\\120\\70\\55","\\235\\45\\181\\44\\98","\\150\\56\\132\\207\\19","\\67\\95\\161\\39\\122","\\109\\65\\145\\170\\79","\\\\134\\28\\90\\39\\33","\\226\\177\\240\\202\\157","\\1\\57\\50\\6\\240","\\249\\240\\222\\56\\161","\\110\\136\\88\\85\\249","\\82\\27\\239\\51\\211","\\114\\223\\252\\83\\189","\\129\\216\\251\\218\\80","\\247\\36\\101\\90\\229","\\209\\73\\221\\46\\11","\\242\\12\\120\\117\\\\134","\\146\\198\\57\\177\\49","\\212\\57\\9\\240\\216","\\215\\151\\2!
16\\59\\75","\\47\\132\\161\\165\\54","\\113\\4\\77\\241\\150","\\217\
\184\\149\\53\\124","\\152\\111\\25\\231\\104","\\42\\185\\112\\250\\156","\\39\\131\\14\\140\\189","\\148\\169\\158\\251\\150","\\184\\142\\204\\122\\179","\\19\\189\\181\\105\\116","\\116\\77\\22\\135\\50","\\236\\231\\60\\132\\229","\\200\\63\\76\\232\\9","\\32\\20\\168\\87\\45","\\99\\129\\99\\165\\29","\\2\\208\\66\\228\\105","\\99\\194\\194\\229\\17","\\85\\250\\55\\51\\114","\\200\\165\\249\\77\\72","\\5\\91\\178\\157\\24","\\245\\253\\219\\83\\250","\\166\\103\\181\\196\\34","\\227\\149\\148\\105\\157","\\95\\44\\15\\251\\98","\\183\\32\\161\\223\\178","\\120\\236\\145\\158\\78","\\244\\4\\92\\233\\112","\\189\\231\\124\\92\\19","\\112\\132\\8\\49\\157","\\160\\243\\244\\94\\104","\\150\\176\\139\\251\\157","\\176\\193\\155\\175\\144","\\161\\208\\145\\92\\92","\\77\\122\\94\\69\\182","\\77\\13\\131\\29\\27","\\92\\9\\178\\204\\254","\\177\\4\\154\\211\\63","\\62\\4\\242\\1\\78","\\4\\129\\113\\205\\164","\\168\\95\\68\\89\\38","\\173\\115\\56\\91\\52","\\212\\161\\1!
59\\148\\179","\\133\\9\\89\\207\\62","\\242\\51\\168\\130\\86","\\154\\199\\208\\84\\2","\\160\\215\\250\\104\\22","\\45\\252\\143\\149\\204","\\48\\50\\91\\39\\243","\\94\\168\\48\\202\\122","\\238\\38\\180\\135\\142","\\234\\59\\24\\148\\2","\\237\\227\\23\\40\\140","\\7\\114\\176\\80\\123","\\204\\170\\0\\60\\65","\\217\\202\\249\\158\\182","\\82\\170\\45\\96\\86","\\118\\11\\123\\51\\216","\\192\\130\\153\\88\\59","\\219\\53\\146\\88\\198","\\203\\114\\182\\147\\145","\\158\\140\\239\\104\\247","\\179\\86\\111\\146\\65","\\192\\168\\51\\125\\183","\\8\\251\\77\\143\\231","\\237\\229\\173\\221\\29","\\69\\178\\247\\196\\175","\\114\\33\\237\\189\\119","\\220\\44\\144\\93\\98","\\241\\38\\138\\127\\252","\\66\\218\\237\\199\\157","\\132\\240\\212\\221\\48","\\180\\41\\157\\84\\37","\\203\\180\\58\\113\\136","\\156\\39\\111\\181\\34","\\16\\202\\216\\183\\55","\\154\\51\\122\\201\\45","\\218\\112\\47\\206\\142","\\189\\141\\110\\230\\132","\\80\\167\\61\\103\\247","\\186\!
\15\\121\\27\\167","\\103\\163\\217\\19\\220","\\173\\116\\86\\7\\249"
,"\\25\\37\\98\\35\\127","\\44\\92\\200\\89\\84","\\171\\129\\106\\249\\38","\\24\\147\\77\\\\134\\62","\\254\\184\\72\\159\\91","\\221\\13\\18\\153\\154","\\109\\232\\79\\169\\176","\\152\\103\\190\\50\\18","\\51\\71\\217\\22\\76","\\105\\109\\7\\77\\198","\\250\\121\\163\\49\\73","\\138\\204\\\\134\\247\\116","\\130\\38\\156\\36\\27","\\20\\83\\86\\113\\124","\\40\\63\\161\\157\\76","\\205\\99\\150\\109\\249","\\111\\174\\57\\169\\238","\\106\\169\\245\\170\\240","\\32\\10\\53\\160\\76","\\226\\0\\58\\9\\22","\\63\\83\\21\\3\\205","\\212\\141\\249\\177\\102","\\197\\226\\42\\202\\130","\\70\\40\\85\\176\\2","\\3\\16\\133\\118\\91","\\232\\48\\176\\209\\77","\\20\\149\\0\\2\\144","\\50\\87\\138\\108\\149","\\13\\78\\64\\211\\245","\\15\\158\\123\\62\\103","\\239\\68\\210\\175\\197","\\247\\216\\7\\211\\5","\\112\\100\\135\\210\\101","\\47\\26\\118\\254\\62","\\123\\7\\143\\206\\114","\\184\\43\\252\\56\\194","\\55\\16\\219\\\\134\\201","\\170\\128\\224\\160\\251","\\180\\10!
8\\182\\255\\118","\\164\\155\\151\\195\\67","\\116\\56\\163\\249\\92","\\250\\207\\75\\244\\104","\\122\\219\\25\\49\\17","\\16\\61\\66\\50\\32","\\15\\223\\166\\\\134\\188","\\16\\221\\48\\159\\124","\\163\\66\\245\\19\\190","\\52\\177\\137\\57\\104","\\137\\158\\143\\12\\73","\\175\\156\\252\\243\\165","\\18\\119\\\\134\\198\\209","\\179\\60\\37\\63\\136","\\68\\117\\75\\163\\27","\\234\\108\\150\\93\\15","\\209\\159\\154\\221\\138","\\70\\215\\50\\36\\255","\\237\\64\\91\\125\\54","\\17\\41\\177\\34\\178","\\19\\241\\29\\12\\34","\\\\134\\151\\112\\6\\214","\\63\\61\\146\\243\\60"}',1,0,1088649636);

For testing purposes, here is the Perl script I used to make the revised
input from the original test script:

#!/usr/bin/perl -w

$lastat = 0;

while (<>) {
if (m/^select put_token\((\d+),'(.*)',(\d+),(\d+),(\d+)\);$/) {
$d1 = $1;
$data = $2;
$d3 = $3;
$d4 = $4;
$d5 = $5;
$data =~ s/\\/\\\\/g;
if ($d5 == $lastat) {
$last2 = $last2 . ',"' . $data . '"';
} else {
if ($lastat) {
print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n";
}
$last1 = $d1;
$last2 = '"' . $data . '"';
$last3 = $d3;
$last4 = $d4;
$lastat = $d5;
}
} else {
print ;
}
}

print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n";

BTW, the data quoting is probably wrong here, but I suppose that can be
fixed easily.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Schumacher 2005-07-30 21:06:49 Re: Performance problems testing with Spamassassin 3.1.0
Previous Message Karim Nassar 2005-07-30 10:34:00 Re: Performance problems testing with Spamassassin