Re: [GENERAL] workaround for lack of REPLACE() function

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, lockhart(at)fourpalms(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: [GENERAL] workaround for lack of REPLACE() function
Date: 2002-08-12 21:55:54
Message-ID: 3D582EEA.1000903@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-patches

Tatsuo Ishii wrote:
>>Now a new function similar to toast_raw_datum_size(), maybe
>>toast_raw_datum_strlen() could be used to get the original string
>>length, whether MB or not, without needing to retrieve and decompress
>>the entire datum.
>>
>>I understand we would either: have to steal another bit from the VARHDR
>>which would reduce the effective size of a valena from 1GB down to .5GB;
>>or we would need to add a byte or two to the VARHDR which is extra
>>per-datum overhead. I'm not sure we would want to do either. But I
>>wanted to toss out the idea while it was fresh on my mind.
>
>
> Interesting idea. I also was thinking about adding some extra
> infomation to text data types such as character set, collation
> etc. for 7.4 or later.

I ran some tests to confirm the theory above regarding overhead;

create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select length(f1) from strtest;
loop

Results:

SQL_ASCII database, new code:
=============================
PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
==> 2 seconds

SQL_ASCII database, old code:
=============================
text
*t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
==> 66 seconds

EUC_JP database, new & old code:
================================
text
*t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t),
VARSIZE(t) - VARHDRSZ));
==> 469 seconds

So it appears that, while detoasting is moderately expensive (adds 64
seconds to the test), the call to pg_mbstrlen_with_len() is very
expensive (adds 403 seconds to the test).

Joe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter Darley 2002-08-12 22:01:25 Vacuum problem
Previous Message Bolden, Thomas 2002-08-12 20:29:26 trigger column update

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-08-12 21:57:03 VACUUM's "No one parent tuple was found", redux
Previous Message Rod Taylor 2002-08-12 20:58:57 Re: CLUSTER all tables at once?

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2002-08-12 22:07:12 Re: Antw: Re: Patch for NetWare support
Previous Message Neil Conway 2002-08-12 20:52:14 additional ONLY docs