Skip site navigation (1) Skip section navigation (2)

Re: number of rows estimation for bit-AND operation

From: Slava Moudry <smoudry(at)4info(dot)net>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: number of rows estimation for bit-AND operation
Date: 2009-08-21 07:04:47
Message-ID: C2D062D968554E13B47F6214EC0040AF@DUKE (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,
Sorry I don't understand how the numbers came so low.
If you assume that 8 boolean fields take 1 byte each, so for 100M table it 
will be 800M bytes.
How did your table fit in 3560616 bytes ?

Using postgres 8.4.0 on Linux x64:
create table staging.tmp_t1(a1 boolean, a2 boolean, a3 boolean, a4 boolean, 
a5 boolean, a6 boolean, a7 boolean, a8 boolean) tablespace stage3;
insert into staging.tmp_t1 select 
1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean 
from generate_series(1,100000000);
select pg_total_relation_size('staging.tmp_t1');
 pg_total_relation_size
------------------------
             3,625,689,088
(1 row)
The table with 16 booleans took just 766MB more, so the growth appears to be 
non-linear.
Most likely Postgres does some compression.. I can't tell for sure without 
looking into source code.

Anyway, given that int8 can accomodate 64 flags - the space saving can be 
substantial.
Thanks,
-Slava.



----- Original Message ----- 
From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Slava Moudry" <smoudry(at)4info(dot)net>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>; 
<pgsql-performance(at)postgresql(dot)org>
Sent: Thursday, August 20, 2009 6:58 PM
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation


On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com> 
wrote:
> 2009/8/20 Slava Moudry <smoudry(at)4info(dot)net>:
>> Hi,
>> Yes, I thought about putting the bit-flags in separate fields.
>> Unfortunately - I expect to have quite a lot of these and space is an 
>> issue when you are dealing with billions of records in fact table, so I 
>> prefer to pack them into one int8.
>
> For giggles I created two test tables, one with a single int, one with
> 8 bools, and put 100M entries in each. The table with 8 bools took up
> aprrox. 3560616 bytes, while the one with a single int took up approx.
> 3544212
>
> I.e they're about the same. You should really test to see if having a
> lot of bools costs more than mangling ints around. I'm guessing I
> could fit a lot more bools in the test table due to alignment issues
> than just 8.

So, I made a table with 26 bool fields, and added 100M rows to it, and
that table took up about 5906028 bytes.  So yea, the storage is
greater for boolean fields, but only if they aren't null.  making them
null would save a lot of space, so if null bits fit your model, then
it might be worth looking into.  Certainly they're not so much bigger
as to be unmanageable.


In response to

pgsql-performance by date

Next:From: Gaƫl Le MignotDate: 2009-08-21 13:37:35
Subject: Performance regression between 8.3 and 8.4 on heavy text indexing
Previous:From: Jeff JanesDate: 2009-08-21 04:38:43
Subject: [PERFORMANCE] how to set wal_buffers

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group