Re: [INTERFACES] (libpq question) Holy cow, what's all this fluff?!

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Hagerty <matthew(at)venux(dot)net>
Cc: pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: [INTERFACES] (libpq question) Holy cow, what's all this fluff?!
Date: 1999-02-14 20:38:26
Message-ID: 10723.919024706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

Matthew Hagerty <matthew(at)venux(dot)net> writes:
> At 01:45 PM 2/14/99 -0500, Tom Lane wrote:
>> Whatever docs you are looking at are obsolete. char(n), varchar(n), and
>> text have essentially interchangeable performance and representation.

> The docs that mention the performance hit were really the FAQ, item 3.10.
> Check it out.

The FAQ is mostly right if you read it very carefully, but I think it's
misleading.

: CHAR char 1 character
: CHAR(#) bpchar blank padded to the specified fixed length
: VARCHAR(#) varchar size specifies maximum length, no padding
: TEXT text length limited only by maximum tuple length
: BYTEA bytea variable-length array of bytes
:
: The last four types above are "varlena" types (i.e. the first four
: bytes are the length, followed by the data). char(#) allocates the
: maximum number of bytes no matter how much data is stored in the
: field. text, varchar(#), and bytea all have variable length on the
: disk, and because of this, there is a small performance penalty for
: using them. Specifically, the penalty is for access to all columns
: after the first column of this type.

The FAQ is wrong to claim that char is not a varlena type --- it is in
fact the same as char(1). (There is a true single-byte type named char
defined internally, but an appearance of "char" in a CREATE TABLE query
gets translated to char(1) instead. The inconsistency of internal and
external type names has doubtless contributed to confusion in this area.)

As the FAQ correctly states, char(n) is a varlena type; that means that
you pay a four-byte overhead for the length word, plus an alignment
penalty of zero to three bytes to ensure the length word is
long-aligned. (This is why I remarked that char(1) is effectively an
8-byte type: if you put several of them in a row, they take up 8 bytes
apiece.) This space overhead, plus the time overhead from dealing with
non-fixed-length columns, is the performance penalty paid for any
varlena type.

One particular component of the time cost for varlena types is that
the within-tuple offset of later columns is not fixed, meaning that
fastgetattr() can't cache their offsets, but has to scan the tuple data
to find them. Although char(n) is represented as a varlena type, there
is a special hack in fastgetattr() to make it treat char(n) as
fixed-size for the specific purpose of deciding whether later columns'
offsets are cacheable.

The FAQ is correct to describe this effect as "small", but perhaps it
should say "very small" --- for one thing the offset cache is irrelevant
if you have any NULL fields in the current tuple. I wouldn't worry
about it myself, and I certainly wouldn't choose an inappropriate data
type for my application in order to avoid it. The FAQ is not doing you
any service to suggest that you should prefer char(n) when you don't
really want a fixed-length string. The space penalty of the unwanted
pad data is likely to cost you far more than you can hope to save by
making fastgetattr() a little quicker.

> Also, item 3.8 mentions psql's \do command, however when I
> type \do nothing happens and I have to ctrl-c to get the prompt back.

\do works for me, but it's pretty slow (about 20 seconds on my hardware)
... how long did you give it?

regards, tom lane

Browse pgsql-interfaces by date

  From Date Subject
Next Message Bryan Mattern 1999-02-14 21:32:53 Re: [INTERFACES] Pg from CGI in Apache
Previous Message Ken J. Wright 1999-02-14 19:56:32 ODBC authentication failure