Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, lockhart(at)fourpalms(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function
Date: 2002-08-14 06:15:25
Message-ID: 3D59F57D.1070804@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-patches

Tatsuo Ishii wrote:
> Joe Conway wrote:
>>Any objection if I rework this function to meet SQL92 and fix the bug?
>
> I don't object.
>
>>Or is the SQL92 part not desirable because it breaks backward
>>compatability?
>
> I don't think so.
>
>>In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
>>test for encoding max length?
>
> Sure.

<sorry so long-winded>

Attached is a patch that implements the above items wrt text_substr(). I
also modified textlen(), textoctetlen(), byteaoctetlen(), and
bytea_substr(). Here's a summary of the change to each:

- text_substr(): rewrite function to meet SQL92, fix MB related bug,
and remove #ifdef MULTIBYTE.

- bytea_substr(): same as text_substr() enc max len == 1.

- textoctetlen(), byteaoctetlen(): use
toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ)
to avoid detoasting.

- textlen(): same as textoctetlen() for enc max len == 1, and remove
#ifdef MULTIBYTE.

I did some benchmarking to ensure no performance degradation, and to
help me understand MB and related performance issues. The results were
very enlightening:

===================================================================
First test - textlen() (already reported, repeated here for completeness):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select length(f1) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, new code 2 seconds
SQL_ASCII database, old code 66 seconds
EUC_JP database, new & old code 469 seconds
===================================================================
Second test - short string test:
-------------------------------------------------------------------
create table parts(partnum text);
<fill with ~220000 rows, 8 to 12 characters each>

do 300 times
select substr(partnum, 3, 3) from parts;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 352 seconds
SQL_ASCII database, new code 350 seconds
EUC_JP database, old code 461 seconds
EUC_JP database, new code 422 seconds
===================================================================
Third test - long string, EXTENDED storage (EXTERNAL+COMPRESSED):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 59 seconds
SQL_ASCII database, new code 58 seconds
EUC_JP database, old code 915 seconds
EUC_JP database, new code 912 seconds
===================================================================
Forth test - long string, EXTERNAL storage (not COMPRESSED)
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 17 seconds
SQL_ASCII database, new code 17 seconds
EUC_JP database, old code 918 seconds
EUC_JP database, new code 911 seconds

The only remaining problem is that this causes opr_sanity to fail based
on this query:

-- Considering only built-in procs (prolang = 12), look for multiple
-- uses of the same internal function (ie, matching prosrc fields).
-- It's OK to have several entries with different pronames for the same
-- internal function, but conflicts in the number of arguments and other
-- critical items should be complained of.
SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid != p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prolang != p2.prolang OR
p1.proisagg != p2.proisagg OR
p1.prosecdef != p2.prosecdef OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);

This fails because I implemented text_substr() and bytea_substr() to
take either 2 or 3 args. This was necessary for SQL92 spec compliance.

SQL92 requires L < 0 to throw an error, and L IS NULL to return NULL. It
also requires that if L is not provided, the length to the end of the
string is assumed. Current code handles L IS NULL correctly but not L <
0 -- it assumes L < 0 is the same as L is not provided. By allowing the
function to determine if it was passed 2 or 3 args, this can be handled
properly.

So the question is, can/should I change opr_sanity to allow this case?

I also still owe some additions to the strings regression test to make
it cover toasted values.

Other than those two issues, I think the patch is ready to go. I'm
planning to take on the replace function next.

Thanks,

Joe

Attachment Content-Type Size
substr.2002.08.13.3.patch text/plain 15.7 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-08-14 06:23:42 Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function
Previous Message Patrick Nelson 2002-08-14 06:03:44 Re: Blob stuff

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-08-14 06:23:42 Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function
Previous Message Hannu Krosing 2002-08-14 06:11:59 Re: Open 7.3 items

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2002-08-14 06:23:42 Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function
Previous Message Bruce Momjian 2002-08-14 06:05:08 Re: [HACKERS] CREATE TEMP TABLE .... ON COMMIT