Re: Update table performance

From: Decibel! <decibel(at)decibel(dot)org>
To: Erik Jones <erik(at)myemma(dot)com>
Cc: Mark Makarowsky <bedrockconstruction(at)yahoo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Update table performance
Date: 2007-08-08 17:28:14
Message-ID: 20070808172814.GJ20424@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote:
> Vertical partitioning is where you split up your table on disk by
> columns, i.e on the vertical lines. He quoted it because Postgres
> doesn't actually support it transparently but you can always fake it
> by splitting up your table. For example, given the following table
> wherein column bar gets updated a lot but the others don't:
>
> create table foo (
> id int not null,
> bar int,
> baz int,
>
> primary key (id)
> );
>
> You could split it up like so:
>
> create table foo_a (
> id int,
> baz int,
>
> primary key (id)
> );
>
> create table foo_b (
> foo_id int,
> bar int,
>
> foreign key foo_a_id (foo_id) references foo_a (id)
> );

FWIW, the cases where I've actually used this have been on much wider
tables, and a number of the attributes are in-frequently accessed. An
example would be if you keep snail-mail address info for users; you
probably don't use those fields very often, so they would be good
candidates for going into a second table.

When does it actually make sense to use this? When you do a *lot* with a
small number of fields in the table. In this example, perhaps you very
frequently need to look up either user_name or user_id, probably via
joins. Having a table with just name, id, perhaps password and a few
other fields might add up to 50 bytes per row (with overhead), while
address information by itself could easily be 50 bytes. So by pushing
that out to another table, you cut the size of the main table in half.
That means more efficient use of cache, faster seqscans, etc.

The case Erik is describing is more unique to PostgreSQL and how it
handles MVCC. In some cases, splitting a frequently updated row out to a
separate table might not gain as much once we get HOT, but it's still a
good tool to consider. Depending on what you're doing another useful
technique is to not update the field as often by logging updates to be
performed into a separate table and periodically processing that
information into the main table.
--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Vivek Khera 2007-08-08 17:42:34 Re: When/if to Reindex
Previous Message Erik Jones 2007-08-08 15:15:41 Re: Update table performance