Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jameison Martin <jameisonb(at)yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Date: 2012-04-26 07:27:56
Message-ID: CA+U5nM+SDF78+SbPMAadUB5UWWjW4p2XJwwjj-2FwrT6KHCTnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb(at)yahoo(dot)com> wrote:
> Tom, I whipped up some  INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
> results were the same
>   2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().

I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.

It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.

The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.

Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.

It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-04-26 07:56:40 Re: Temporary tables under hot standby
Previous Message Heikki Linnakangas 2012-04-26 06:37:02 Re: Temporary tables under hot standby