Re: Converting MySQL tinyint to PostgreSQL

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Converting MySQL tinyint to PostgreSQL
Date: 2005-07-14 16:30:36
Message-ID: 20050714163036.GO92165@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jul 13, 2005 at 07:52:04PM -0400, Bruce Momjian wrote:
> Ron Mayer wrote:
> > Martijn van Oosterhout wrote:
> > >
> > > Well, you get another issue, alignment. If you squeeze your string
> > > down, the next field, if it is an int or string, will get padded to a
> > > multiple of 4 negating most of the gains. Like in C structures, there
> > > is padding to optimise access.
> >
> > Anecdotally I hear at least as many people say that their database
> > is more I/O bound than CPU bound; and it seems that adding bytes
> > for alignment is a way of reducing CPU for more disk I/O.
> >
> > I guess unaligned access so expensive that it makes up for the extra i/o?
>
> This is a good point. We have always stored data on disk that exactly
> matches its layout in memory. We could change that, but no one has
> shown it would be a win.

Out of curiosity, what would be involved in hacking the backend enough
to be able to test this theory out? I'm guessing you'd want to convert
between on-disk and in-memory formats as you read pages in, so either
on-disk pages would become variable size (and smaller than memory pages)
or in-memory pages would become variable size (and larger than on-disk
pages).

Or maybe as an alternative, would it be possible to determine how much
space in a given relation was being wasted due to padding? That could be
used to figure out how much IO could be saved on different transactions.
While there would be a slight CPU penalty every time you read or write a
page, I find it hard to believe it could come close to equaling IO cost.

On a side note, I think it might be useful to have a seperate TODO
catagory for ideas that need to be tested to see if they're worth
implementing for real. This is a case where it's probably substantially
easier to estimate (or maybe even measure) how much there is to gain
from this than to do the actual work and then see if it helps. It's also
likely that a less experienced hacker could test the theory out. Some
likely items for this list:

Reduce WAL traffic so only modified values are written rather than
entire rows?
Find a way to reduce rotational delay when repeatedly writing last WAL
page
Precompile SQL functions to avoid overheadDo async I/O for faster random
read-ahead of data

Not on todo:
Estimate gains from not using the in-memory format of data for on-disk
storage
Estimate gains from reducing the amount of space used by visibility
information in each tuple

BTW, what ever happened to the idea of having a list of projects for
beginners? (Or maybe it'd be better to assign a numeric difficulty to
each TODO item?)
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-07-14 16:33:58 Re: ERROR: could not open relation
Previous Message Alvaro Herrera 2005-07-14 16:25:50 Re: Standalone Parser for PL/pgSQL