Re: [HACKERS] lztext.c

From: wieck(at)debis(dot)com (Jan Wieck)
To: t-ishii(at)sra(dot)co(dot)jp
Cc: wieck(at)debis(dot)com, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] lztext.c
Date: 1999-11-25 02:23:52
Message-ID: m11qoZc-0003kGC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii wrote:

> > Don't spend much efford for comparision and the SUBSTRING()
> > things right now. I already have an additional, generalized
> > decompressor in mind, that can be used in the comparision for
> > example to decompress two values on the fly and stop
> > comparision at the first difference, which usually happens
> > early in two random datums.
>
> Ok.
>
> > Tell me when you have the multi-byte (and maybe cyrillic?)
> > stuff committed and I'll take my hands back on the code.
>
> I have committed the changes just now, though cyrillic support is not
> included. I vaguely recall the discussion about the usefullness of
> the cyrillic support.

I added the comparision functions, operators and the default
nbtree operator class for indexing.

For the SUBSTR() and STRPOS(), I just checked the current
setup and it automatically casts an lztext argument in these
functions to text. I assume lztext can now be used in every
place where text is allowed. Is it really worth to blow up
the catalogs with rarely used functions that only gain some
saved decompressed portion?

Remember, the algorithm is optimized for decompression speed.
It might save some time to do this for a comparision function
used inside of index scans or btree operations, where it's
likely to hit a difference early. But for something like
STRPOS(), using the default cast and changing the STRPOS()
match search itself into a KMP algorithm (instead of walking
through the text and comparing each position against the
pattern using strncmp) would outperform it in any case. With
the byte by byte strncmp() method, we definitely implemented
the slowest and best readable possibility.

I think we should better spend our time in adding a lzbpchar
type. Or work on compressed tables and tuple split to blow
away the size limits at all.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-11-25 05:43:01 Re: [HACKERS] run_check problem
Previous Message Jan Wieck 1999-11-25 01:47:09 Re: [HACKERS] run_check problem