TOAST usage setting

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: TOAST usage setting
Date: 2007-05-29 00:18:55
Message-ID: 200705290018.l4T0ItH18228@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

bruce wrote:
> > > * Re: [HACKERS] Modifying TOAST thresholds /Tom Lane/
> > >
> > > At this point it seems nothing will be done about this issue for 8.3.
> >
> > I'm not sure anyone has an idea how to test it. TPCC isn't really useful
> > because it has a fixed size (500 byte) string buffer. Perhaps if we modified
> > it to have a random string length uniformly distributed between 0-2k ? But
> > even then it never does any scans based on that buffer. But the problem with
> > going with something more natural is that it'll be harder to tell exactly what
> > it's testing.
>
> My idea on this was to create two backends, one with the default TOAST
> value, and a second with a value of 50 bytes. Create a table with one
> TEXT field, and several other columns, each column < 50 bytes.
>
> Then, fill the table with random data (script attached that might help),
> and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
> (use random data so the compression code doesn't shrink it). Then run a
> test with both backends acessing the TEXT column and non-TEXT column and
> measure the difference between the two backends, i.e. the backend with a
> TOAST value of 50 should show faster access on the non-TEXT field, but
> slower access on the TEXT field.
>
> Then, figure out where the gains on the non-TEXT field seem to diminish
> in usefulness. Basically, with a lower TOAST value, we are going to
> spend more time accessing the TEXT field, but the speedup for the
> non-TEXT field should be large enough win that we don't care. As the
> TEXT column becomes shorter, it has less affect on the non-TEXT access.

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s. I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:

http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values. I am somewhat confused why TOAST has faster access
than inline heap data.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2007-05-29 00:19:18 Re: What is the maximum encoding-conversion growth rate, anyway?
Previous Message Bruce Momjian 2007-05-28 23:56:09 Re: Do we need a TODO? (was Re: Concurrently updating an updatable view)