Re: [HACKERS] LZTEXT for rule plan stings

From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Jan Wieck <wieck(at)debis(dot)com>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Jan Wieck <wieck(at)debis(dot)com>, PostgreSQL HACKERS <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] LZTEXT for rule plan stings
Date: 2000-02-26 05:15:13
Message-ID: 3.0.1.32.20000225211513.016e43b0@mail.pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 04:06 AM 2/26/00 +0100, Jan Wieck wrote:

> False.

> With TOAST, the maximum tuple length is limited by available
> disk space (minus some overhead) and/or the number of bits we
> use to represent the values original size and/or the size
> addressable by the TOAST'ers table at all. Available space
> allways limits the amount of data in a DB, and you allways
> have to take some overhead into account, but calling this
> _undefined_ isn't correct IMHO - better call it hard to
> figure out.

Same is true for non-TOAST lzText.

Or...non lzText text, for that matter.

Of course, the size of text IS UNDEFINED TODAY.

create table foo (
t1 text,
t2 text);

Pray tell, what is the maximum size of t1?

Is it independent of t2? Or...correct me if I'm mistaken...if t2
contains 8,000+ characters won't "insert" bomb me if I try to
insert 8,000+ characters into t1? Or even a few characters? Exactly
where is this vaunted and well defined limit?

(oops - you can't answer that question because it depends on the
size of BLCKSZ, which of course one can change at will)

The maximum size of "text" is already undefined, as it depends on:

BLCKSZ (which the user may not've set herself, and maybe is unaware of
if she's a user at the mercy of some sysadmin)

and

the actual bytes occupied by other variable-length columns.

"bytea" for instance. "text" for instance. "varchar(n)" for instance,
which actually is a variable-length string which has a maximum value.

PG lets me do this:

create table foo (
t1 varchar(8000),
t2 varchar(8000),
tn varchar(8000) -- n fairly large
);

Can I insert 8K chars into t1?

Into t2?

Into t3?

Trick PG question - into all three at once?

Sorry, but this is a crap argument. There is no way to know how
many characters you can insert into a "text" column unless you have
detailed knowledge of the table, not only the types in the table
but the data stored in the pertinent row of the table.

I should know, I've been fighting this when porting code over from
Oracle, where the blocksize truly limits the size of ONE COLUMN,
not a row (tuple) at large.

If I can really have a tuple with 1000 varchar(BLCKSZ-overhead) columns,
fully filled with data, could you please tell me how to do this? My
life will be much simpler.

>> If text were implemented as lzText for a quick 7.1, which apparently
>> was Jan's spin on the idea, then for 7.1 we'd say:
>
> On the first look, it was a tempting solution. But there are
> ton's of places in the backend, that assume text is binary
> compatible to something or the bytes after the VARSIZE are
> plain value bytes, not some compressed garbage to be passed
> through a function first. Replacing TEXT by LZTEXT therefore
> wouldn't be such an easy job, but would be working for the
> wastebasked from the very beginning anyway, because TOAST
> needs to revert it all again.

> I don't like that kind of work.

Nor do I, which is why I didn't suggest it when lzText first came up
and drifted into a TOAST discussion. Clearly, TOAST is a better
solution.

In particular, it solves Hannu's objection regarding the fact that
a compressed text type would have no fixed upper limit.

Better yet, it would solve Hannu's misunderstanding that today's
text type has such a limit.

Because (I love flogging dead horses):

create table foo (
i: integer,
t: text);

and

create table bar (
t: text);

create two columns T with different maximum limits. Because the
limit is based on tuple-size.

A compressed text type is only a bad idea because it's a dead end.
Not because it turns a "defined" max text limit into an undefined
max text limit. The maximum number of chars you can stuff into
a text var is always undefined unless you dissect exactly how
other columns eat storage.

> Maybe I found some kind of compromise:
>
> - We make LZTEXT a released type, without warning and anyone
> can use it as needed.
>
> - When featuring TOAST, we remove it and create a type
> alias. This way, the "backend" will convert the table
> schemas (WRT lztext->text) at reload time of the
> conversion.

I have no strong feelings here. Personally, I can live with just
compiling PG with a 16KB blocksize, for the work I'm doing today.

But I don't think the upgrade problem's a big deal. If the type's
not popularized, only those of us "inside" will know of it, and as
far as I'm concerned, hand-editing a pg_dump would be fine with me if
I choose to use it.

But I'm only speaking for myself.

TOAST is clearly the way to go.

On the other hand, I don't see people flinging bricks at Interbase
for compressing their text type. After all, they have outer joins...

> Actually I have some problems with the type coercion stuff.
> There are functions lztext(text) and vice versa, but the
> system is unable to find an "=" operator for lztext and text
> when issuing
>
> SELECT * FROM t1, t2, WHERE t1.lztext_att = t2.text_att;
>
> This worked in the past releases (IIRC), so I wonder if the
> failure above is a wanted "feature". I'll commit the stuff I
> have tomorrow and hope someone can help me to get the
> coercion working. All we have to do then is to tell in the
> release notes and docs "Never use LZTEXT type name explicitly
> in an application query (like for type casting) - use TEXT
> instead".

Despite the above, I have no really strong feelings. I only raised
the compressed text issue because my (belated) reading of the Interbase
docs made it clear that they do this, and Tom resurrected lztext in
regard to views (and my problems there probably made it a red herring
in this case, too!) It's an interesting idea, and if TOAST is indeed
implemented probably a moot one. Though...where is the crossover between
an in-place compression and moving an item to the TOASTed table. And...
all of the problems with the backend making assumptions about text
etc will have to be addressed by the TOASTER, too.

For instance...varchar(4000) might still benefit from being compressed,
even if it is not TOASTed, due to PG's love of dragging full tuples
around. Saves disk space. Bigger slices of tables can be sorted in
memory vs. disk for any given backend sort/hash buffer size parameter.
Today's x86 CPUs, at least, favor shrinking the memory footprint of
data due to the fact that CPUs tend to be data-starved when working
on large amounts of data in RAM. Etc etc etc. So such a compressed
implementation may actually be a win even if Hannu's made happy by
affixing fixed varchar(n) limits on the column length.

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-02-26 05:35:19 Re: [HACKERS] PC Week Labs benchmark results
Previous Message Bruce Momjian 2000-02-26 04:12:26 ALTER TABLE DROP COLUMN