Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

From: Jameison Martin <jameisonb(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Date: 2012-04-26 00:35:37
Message-ID: 1335400537.64843.YahooMailNeo@web39402.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, I whipped up some  INSERT/SELECT tests where I selected into a temporary table as you suggested. The target temporary table and the source table were in cache and I basically disabled things that would cause noise. The source table had 5 integer columns, and was populated with 10 million rows.

I tried 3 variations:
  1) target has all nullable columns, all set to non null values: the results were the same
  2) target has all nullable columns, only the first column is set: the patch was slightly faster
  3) target has all non-null columns: the patch maybe was slightly faster, probably not statistically relevant

By slightly faster I'm talking on order of 10 nanoseconds per row.

I think #2 is explained by the reduction in loop iterations in heap_fill_tuple(). 

________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jameison Martin <jameisonb(at)yahoo(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

Jameison Martin <jameisonb(at)yahoo(dot)com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2012-04-26 00:48:49 Re: urgent help required
Previous Message Jaime Casanova 2012-04-25 23:15:19 Re: Temporary tables under hot standby