Re: Case Insensitive Queries

From: Mark <mark(at)zserve(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Case Insensitive Queries
Date: 2001-05-30 18:06:00
Message-ID: 200105301857.f4UIvoW17610@mail.ldssingles.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


On 30 May 2001 12:53:22 -0400, Tom Lane wrote:

>
> You are operating under misinformation about what's efficient or not.
> There are no performance penalties that I know of for varchar ... if
> anything, bpchar is the less efficient choice, at least in Postgres.
> The extra I/O costs for those padding blanks add up, and there's no
> compensatory savings anywhere.

with varchars, as I understand it (and postgresql may be different),
each varchar field has a header that stores the length of the particular
entry's length. Further, if the varchar field precedes another field,
the system loses the ability to use fixed-length addressing to access
the field after the varchar, since the system must determine on a
case-by-case basis how to access the field after the varchar. It has to
calculate the size of the varchar, add that to the start of the varchar
(plus header length), and then it has the address of the next field.
With non-variant char it is fixed length, so selects and updates operate
much more quickly. Even the postgresql documentation asserts something
similar to this:

'Both TEXT and VARCHAR() store only the number of characters in the
string. CHAR(length) is similar to VARCHAR(), except it always stores
exactly length characters. This type pads the value with trailing spaces
to achieve the specified length, and provides slightly faster access
than TEXT or VARCHAR().'

Perhaps I am misinformed.

>
> In any case, if your data is really variable-length strings, forcing
> it into a datatype that doesn't match its semantics because of dubious
> micro-efficiency considerations is just plain bad database design.
> Rather than having blanks that you want to pretend aren't there, you
> should not have the blanks in the first place. IMHO anyway.
>

Point well taken. If the gain from using bpchar is not much more than
using varchar and the data used is actualy variable length up to a max
length, the argument is unfounded.

So, what to make of all of this? It depends on the performance
gain/loss of using varchar. We originally used fixed-length chars
because of the performance gain. We try to avoid varchars for that
reason. Now, if postgresql is different, then we'll use varchars, as
that precisely models our data.

Thanks,
Mark

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2001-05-30 18:16:35 Re: Case Insensitive Queries
Previous Message Mark 2001-05-30 17:47:41 Re: Case Insensitive Queries