Re: TOAST usage setting

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: TOAST usage setting
Date: 2007-05-29 10:23:23
Message-ID: 87wsyshqsk.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> I tested TOAST using a method similar to the above method against CVS
> HEAD, with default shared_buffers = 32MB and no assert()s. I created
> backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
> 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
> 1k, 512, 256, and 128, roughly.
>
> The results are here:
>
> http://momjian.us/expire/TOAST/
>
> Strangely, 128 bytes seems to be the break-even point for TOAST and
> non-TOAST, even for sequential scans of the entire heap touching all
> long row values. I am somewhat confused why TOAST has faster access
> than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n
FROM (SELECT count(*) AS n
FROM foo
GROUP BY (point_in(tidout(ctid)))[0]
) as x
GROUP BY n;

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sahoo, Ranjan Rashmi 2007-05-29 11:09:07 Users specific to a Database
Previous Message Hannu Krosing 2007-05-29 08:15:45 Re: status of PlPython