DBMirror.pl performance change

From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: ssinger(at)navtechinc(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Subject: DBMirror.pl performance change
Date: 2006-01-23 10:57:11
Message-ID: Pine.LNX.4.44.0601231236120.18744-200000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


Steven Hi,
i hope you are ok.

I discovered a problem in DBMirror.pl, performance wise.

pending.c stores data in a way
very similar to the PgSQL input "\" escaped format.

When the field is of type bytea, and the source of data is binary, then
this produces 2 additional backslashes for every unprintable
char.

The performance in function extractData in DBMirror.pl, really suffers
from this condition, since it breaks data in chunks of "\" delimited
strings.

Informally speaking, performance tends to be O(n) where n is the size
of the data.

This can be remedied if we break data in chunks of "'" rather than "\".
"'" happens much more infrequently in common binary files (bz2, tiff, jpg,
pdf etc..), and if we notice that odd number of contained "\", signals an
intermidiate "'", whereas even number of "\" signals the final "'",
then we can make this routine run much faster.

I attach the new extractData function.

Please feel free for any comments.

--
-Achilleus

Attachment Content-Type Size
extractData.pl text/plain 1.9 KB

Browse pgsql-sql by date

  From Date Subject
Next Message Bruno Wolff III 2006-01-23 16:44:05 Re: select and as doubt
Previous Message Jesper K. Pedersen 2006-01-22 21:13:28 Re: How to implement Microsoft Access boolean (YESNO)