Re: logical column order and physical column order

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column order and physical column order
Date: 2013-11-03 08:40:18
Message-ID: 52760BF2.60200@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/11/13 20:37, David Rowley wrote:
> I've just been looking at how alignment of columns in tuples can make
> the tuple larger than needed.
>
> I created 2 tables... None of which are very real world, but I was
> hunting for the extremes here...
>
> The first table contained an int primary key and then a total of 10
> int columns and 10 boolean columns, I placed one boolean column after
> an int column so that it was int,bool, int bool, etc
> With the 2nd table I had all the ints first then all the booleans at
> the end of the table. I then inserted 1 million records per table and
> checked the sizes of each table.
>
> The first table was 112 MB and the 2nd table was 81MB, so naturally
> there is a difference when it comes to running queries on these tables.
>
> postgres=# select sum(Value1) from test1;
> sum
> --------------
> 500000500000
> (1 row)
>
>
> Time: 239.306 ms
>
> postgres=# select sum(Value1) from test2;
> sum
> --------------
> 500000500000
> (1 row)
>
>
> Time: 186.926 ms
>
> So in this example a full scan and aggregate of a single column is 28%
> faster.
>
> I'm sure in the real world there are many cases where a better choice
> in column ordering would save space and save processing times, but is
> this something that we want to leave up to our users?
>
> I've not yet looked at the code to see how hard implementing
> separation of column physical order and logical order would be. I
> really just want to get an idea of what the thoughts would be on such
> a change.
>
> I would imagine it should be possible to have a function which
> optimises column orders which is run when a table is created or
> rewritten. New columns would still go onto the end of the tuple unless
> the table had to be rewritten and in this case the column order would
> be optimised again. All plays where column names were displayed
> without explicit ordering, e.g select * and in psql the catalog could
> be queried to see which order these columns should be displayed in.
>
> For reference I've attached the script I used for testing this.
>
> I'd like to implement this as a project, but before I start any work
> on it I'd just like to find out other people's thoughts on it.
>
> Regards
>
> David Rowley
>
>
>
I think the system should PHYSICALLY store the columns in the most space
efficient order, and have a facility for mapping to & from the LOGICAL
order - so that users & application developers only have worry about the
logical order. Even system programers would normally not have to be
concerned with the physical order. I am a little surprised that this is
not already done, to be honest.

Cheers,
Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2013-11-03 10:35:05 Re: logical column order and physical column order
Previous Message David Rowley 2013-11-03 07:37:31 logical column order and physical column order