Re: TOAST usage setting

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: TOAST usage setting
Date: 2007-05-30 02:20:27
Message-ID: 200705300220.l4U2KR225227@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
> Gregory Stark wrote:
> > "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
> >
> > >> No, we did substring() too :)
> > >
> > > Uh, I looked at text_substring(), and while there is an optimization to
> > > do character counting for encoding length == 1, it is still accessing
> > > the data.
> >
> > Sure but it'll only access the first chunk. There are two chunks in your test.
> > It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
> > 2 chunks (hashtext()).
> >
> > Or if you're concerned with the cpu cost of hashtext you could calculate the
> > precise two bytes you need to access with substr to force it to load both
> > chunks. But I think the real cost of unnecessary toasting is the random disk
> > i/o so the cpu cost is of secondary interest.
>
> OK, will run a test with hashtext(). What I am seeing now is a 10-20x
> slowdown to access the TOAST data, and a 0-1x speedup for accessing the
> non-TOAST data when the rows are long:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:

http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column allows quick
access to non-TOAST values and single-row TOAST values, but accessing
all TOAST columns is slower than accessing them in the heap, by a factor
of 3-18x.

Looking at the chart, it seems 512 is the proper breakpoint for TOAST
because 512 gives us a 2x change in accessing non-TOAST values and
single-row TOAST values, and it is only 2x slower to access all TOAST
values than we have now.

Of course, this has all the data in the cache, but if the cache is
limited, pushing more to TOAST is going to be a bigger win. In general,
I would guess that the number of times all >512 byte rows are accessed
is much less than the number of times that pushing those >512 byte
values to TOAST will give a speedup.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-05-30 02:47:00 Re: interval / interval -> double operator
Previous Message Robert Treat 2007-05-30 02:10:54 Re: Reviewing temp_tablespaces GUC patch