Quick Links

Re: Large databases, performance

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	shridhar_daithankar(at)persistent(dot)co(dot)in
Cc:	pgsql-hackers(at)postgresql(dot)org, pgsql-general <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Large databases, performance
Date:	2002-10-07 14:10:26
Message-ID:	69v2qu4n8fdt5do8dids2a9m98p0q8bu9r@4ax.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers pgsql-performance pgsql-sql

On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar"
<shridhar_daithankar(at)persistent(dot)co(dot)in> wrote:
>Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All
>numbers include indexes. This is really going to be a problem when things are
>deployed. Any idea how can it be taken down?

Shridhar,

if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit
integer specifying the length followed by as many characters as the
length tells. On 32-bit Intel hardware this structure is aligned on a
4-byte boundary.

For your row layout this gives the following sizes (look at the "phys
size" column):

| Field Field Null Indexed phys mini
| Name Type size
|--------------------------------------------
| type int no no 4 4
| esn char (10) no yes 16 11
| min char (10) no yes 16 11
| datetime timestamp no yes 8 8
| opc0 char (3) no no 8 4
| opc1 char (3) no no 8 4
| opc2 char (3) no no 8 4
| dpc0 char (3) no no 8 4
| dpc1 char (3) no no 8 4
| dpc2 char (3) no no 8 4
| npa char (3) no no 8 4
| nxx char (3) no no 8 4
| rest char (4) no no 8 5
| field0 int yes no 4 4
| field1 char (4) yes no 8 5
| field2 int yes no 4 4
| field3 char (4) yes no 8 5
| field4 int yes no 4 4
| field5 char (4) yes no 8 5
| field6 int yes no 4 4
| field7 char (4) yes no 8 5
| field8 int yes no 4 4
| field9 char (4) yes no 8 5
| ----- -----
| 176 116

Ignoring nulls for now, you have to add 32 bytes for a v7.2 heap tuple
header and 4 bytes for ItemIdData per tuple, ending up with 212 bytes
per tuple or ca. 85 GB heap space for 432000000 tuples. Depending on
fill factor similar calculations give some 30 GB for your index.

Now if we had a datatype with only one byte for the string length,
char columns could be byte aligned and we'd have column sizes given
under "mini" in the table above. The columns would have to be
rearranged according to alignment requirements.

Thus 60 bytes per heap tuple and 8 bytes per index tuple could be
saved, resulting in a database size of ~ 85 GB (index included). And
I bet this would be significantly faster, too.

Hackers, do you think it's possible to hack together a quick and dirty
patch, so that string length is represented by one byte? IOW can a
database be built that doesn't contain any char/varchar/text value
longer than 255 characters in the catalog?

If I'm not told that this is impossibly, I'd give it a try. Shridhar,
if such a patch can be made available, would you be willing to test
it?

What can you do right now? Try using v7.3 beta and creating your
table WITHOUT OIDS. This saves 8 bytes per tuple; not much, but
better save 4% than nothing.

Servus
Manfred

In response to

Re: Large databases, performance at 2002-10-07 09:37:29 from Shridhar Daithankar

Responses

Re: Large databases, performance at 2002-10-07 14:18:31 from Shridhar Daithankar

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2002-10-07 14:17:35	Re: Server locale?
Previous Message	Bruce Momjian	2002-10-07 14:09:32	Re: Stored Procedures

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Shridhar Daithankar	2002-10-07 14:18:31	Re: Large databases, performance
Previous Message	Hans-Jürgen Schönig	2002-10-07 13:52:54	Re: Table spaces again [was Re: Threaded Sorting]

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Shridhar Daithankar	2002-10-07 14:18:31	Re: Large databases, performance
Previous Message	Larry Rosenman	2002-10-07 12:04:33	Re: [HACKERS] cross-posts (was Re: Large databases,

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Shridhar Daithankar	2002-10-07 14:18:31	Re: Large databases, performance
Previous Message	Dan Langille	2002-10-07 12:44:32	Re: timestamp