Re: Reducing the overhead of NUMERIC data

From: mark(at)mark(dot)mielke(dot)cc
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Maxwell <gmaxwell(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-04 18:10:03
Message-ID: 20051104181003.GA16141@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Fri, Nov 04, 2005 at 04:13:29PM +0100, Martijn van Oosterhout wrote:
> On Fri, Nov 04, 2005 at 08:38:38AM -0500, mark(at)mark(dot)mielke(dot)cc wrote:
> > On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote:
> > > Actually, the real reason we use UTF-8 and not any of the
> > > sorta-fixed-size representations of Unicode is that the backend is by
> > > and large an ASCII, null-terminated-string engine. *All* of the
> > > supported backend encodings are ASCII-superset codes. Making
> > > everything null-safe in order to allow use of UCS2 or UCS4 would be
> > > a huge amount of work, and the benefit is at best questionable.
> > Perhaps on a side note - my intuition (which sometimes lies) would tell
> > me that, if the above is true, the backend is doing unnecessary copies
> > of read-only data, if only, to insert a '\0' at the end of the strings.
> > Is this true?
> It's not quite that bad. Obviously for all on disk datatype zeros are
> allowed. Bit strings, arrays, timestamps, numerics can all have
> embedded nulls and they have a length header.

Are you and Tom conflicting in opinion? :-)

I read "the backend is by and large an ASCII, null-terminated-string
engine" with "we use UTF-8 [for varlena strings?]" as, a lot of the
code assumes varlena strings are '\0' terminated, and an assumption
on my part, that the varlena strings are not stored in the backend
with a '\0' terminator, therefore, they require being copied out,
terminated with a '\0', before they can be used?

Or perhaps I'm just confused. :-)

> > I'm thinking along the lines of the other threads that speak of PostgreSQL
> > being CPU or I/O bound, not disk bound, for many sorts of operations. Is
> > PostgreSQL unnecessary copying string data around (and other data, I would
> > assume).
> Well, there is a bit of copying around while creating tuples and such,
> but it's not to add null terminators.

How much effort (past discussions that I've missed from a decade ago?
hehe) has been put into determining whether a zero-copy architecture,
or really, a minimum copy architecture, would address some of these
bottlenecks? Am I dreaming? :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2005-11-04 18:14:31 Re: [PERFORM] insert performance for win32
Previous Message Merlin Moncure 2005-11-04 18:07:24 Re: insert performance for win32

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-11-04 18:16:51 Re: AIX FAQ addition
Previous Message Stefan Kaltenbrunner 2005-11-04 17:43:14 Re: AIX FAQ addition