Re: Compression and on-disk sorting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-19 18:53:05
Message-ID: 3134.1148064785@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
>> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
>> unbeleiveable. What's in the table? It would seem to imply that our
>> tuple format is far more compressable than we expected.

> It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> If the tape routines were actually storing visibility information, I'd
> expect that to be pretty compressible in this case since all the tuples
> were presumably created in a single transaction by pgbench.

It's worse than that: IIRC what passes through a heaptuple sort are
tuples manufactured by heap_form_tuple, which will have consistently
zeroed header fields. However, the above isn't very helpful since the
rest of us have no idea what that "accounts" table contains. How wide
is the tuple data, and what's in it?

(This suggests that we might try harder to strip unnecessary header info
from tuples being written to tape inside tuplesort.c. I think most of
the required fields could be reconstructed given the TupleDesc.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-05-19 18:57:29 Re: PL/pgSQL 'i = i + 1' Syntax
Previous Message max.poletto 2006-05-19 18:51:00 problem with PQsendQuery/PQgetResult and COPY FROM statement