Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Date: 2012-04-27 00:51:17
Message-ID: 4F99ED85.7010305@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

> I can't help thinking that (a) this is an incredibly narrow use-case,
> and (b) you'd be well advised to rethink your schema design anyway.

It's more common than you'd think. Both EAV and Hstore have their own
(severe) drawbacks.

For example, I'm working on an application which collects telemetry data
from hardware. This can involve up to 700 columns of data, most of
which is booleans, and an awful lot of which is NULL.

Also, adding lots of columns *is* following "proper" relational design
like we urge users to do, so it would be nice to make it perfomant.

Now, the other issue I'd be worried about for this optimization is what
happens when the nulls become non-trailing? For example, this pattern:

1. Out of 700 columns, columns 301+ are all Null, so we map them away.
2. User updates column 688 to non-null
3. Suddenly we have a MUCH larger row which will no longer fit on the page.

If your application had a lot of that kind of update pattern, I'd be
concerned that this would be a deoptimzation.

> If there is a exact standard as to how this group does performance
> analysis (e.g. removing outliers beyond a certain standard deviation,
> number of repetitions, machine isolation requirements and so forth),
> please let me know.

Oh, don't I wish! We're a lot more "cowboy" that that. Greg Smith and
Mark Wong have been trying to build a performance testing
infrastructure, but right now our test software and methodology is
*very* primitive. You're welcome to help and suggest.

> I can submit my results as is but in the interest
> of avoiding a lot of duplicate postings perhaps someone can point me
> to an example of what kinds of numbers are desired so I can make sure
> my posting conforms to that. For what it is worth I ran the 3 tests
> 10 times each and removed the outliers, but I can run 100 times or do
> something different if need be (e.g. post a csv for easy consumption
> in a spreadsheet).

Actually, I think just doing a battery of pgbench tests, for both the
bigger and smaller than memory cases, with the patch installed, would
give us some results for the non-NULL case. Something more
sophisticated like DVDstore or DBT2 would be even better, since the
tables there have more columns.

> I tried 3 variations:
> 1) target has all nullable columns, all set to non null values: the
results were the same
> 2) target has all nullable columns, only the first column is set:
the patch was slightly faster
> 3) target has all non-null columns: the patch maybe was slightly
faster, probably not statistically relevant

This seems pretty on-target; can you share the numbers, the nature of
the test, and the setup with us so that we can evaulate it?

> Also, Simon, you mentioned posting "environment
> notes", can you let me know what kind of environment notes are
> desired? For example, are you thinking about changes to the vanilla
> postgresql.conf, hardware information, OS config, etc?

Yes, exactly.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2012-04-27 06:44:59 Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Previous Message Josh Berkus 2012-04-27 00:26:52 Re: Future In-Core Replication