Skip site navigation (1) Skip section navigation (2)

Giant TOAST tables due to many almost empty pages

From: Rumko <rumcic(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Giant TOAST tables due to many almost empty pages
Date: 2010-05-12 19:23:35
Message-ID: hsev7n$egn$1@dough.gmane.org (view raw or flat)
Thread:
Lists: pgsql-bugs
Hi!

I'm running 8.4.3 (the exact same problem was also present on 8.4.2) installed
from rpm packages at http://yum.pgsqlrpms.org/ on CentOS 5.4 (x86_64).

I have experienced a bit of a problem with my DB's storage and upon further
investigation, noticed that only some (2 for each day of data) of my tables
(hundreds of table per day) are many times bigger than they should be.

From what I can tell (have only checked a few tables), the main difference
between these giant tables and others seems to be, that there are 10+ array
type columns, while others have less (also all tables contain quite a few
non-toastable int8 and/or float8 columns + 2x timestamp + 2 x varchar).
The affected tables get many inserts (~5 concurrent COPY statements) throughout
the day (with ~10min breaks inbetween), but there are no updates and nothing
gets deleted (other than by dropping the table).
VACUUM FULL and CLUSTER do not change the size and the only way to decrease the
size is by e.g. "CREATE TABLE newtable AS TABLE gianttable" which decreases
the size of the table from ~25GB to ~2GB.


Some interesting output (among other things VACUUM FULL and CLUSTER output):
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 32,655 ms
# VACUUM FULL VERBOSE low_level."counters_xxx"; SELECT
pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
INFO:  vacuuming "low_level.counters_xxx"
INFO:  "counters_xxx": found 0 removable, 236783 nonremovable row versions in
236783 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 4288 to 4376 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 913296036 bytes.
0 pages are or will become empty, including 0 at the end of the table.
236783 pages containing 913296036 free bytes are potential move destinations.
CPU 5.13s/1.31u sec elapsed 64.87 sec.
INFO:  index "low_level.counters_xxx_unique" now contains 236783 row versions
in 1531 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 1.52 sec.
INFO:  index "low_level.counters_xxx_sddidx" now contains 236783 row versions
in 1127 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.00u sec elapsed 1.12 sec.
INFO:  index "low_level.counters_xxx_noidx" now contains 236783 row versions in
1227 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.98 sec.
INFO:  "counters_xxx": moved 0 row versions, truncated 236783 to 236783 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_1066371"
INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row versions
in 3259181 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 57 to 122 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 69.63s/17.97u sec elapsed 547.98 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in 8938
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.02u sec elapsed 7.35 sec.
VACUUM
Time: 624228,644 ms
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 30,779 ms
# VACUUM FULL VERBOSE pg_toast.pg_toast_1066371;
INFO:  vacuuming "pg_toast.pg_toast_1066371"                   
INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row versions
in 3259181 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 57 to 122 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 31.55s/9.33u sec elapsed 296.60 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in 8938
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.01u sec elapsed 0.76 sec.
VACUUM
Time: 297441,342 ms
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
pg_size_pretty
----------------
 27 GB
(1 row)

Time: 48,422 ms
# SELECT pg_size_pretty(pg_total_relation_size('pg_toast.pg_toast_1066371'));
pg_size_pretty
----------------
 25 GB
(1 row)

Time: 4,816 ms
# CLUSTER VERBOSE low_level."counters_xxx"
USING "low_level.counters_xxx_noidx";
INFO:  clustering "low_level.counters_xxx"
CLUSTER
Time: 4571708,940 ms
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 10,046 ms
# CREATE TABLE newtable AS TABLE low_level."counters_xxx"; SELECT
pg_size_pretty(pg_total_relation_size('newtable'));
SELECT
Time: 1025617,010 ms
 pg_size_pretty
----------------
 2145 MB
(1 row)

Time: 3,415 ms
# SELECT relname, relpages FROM pg_class, (SELECT reltoastrelid FROM pg_class
WHERE relname = 'counters_xxx') ss WHERE oid = ss.reltoastrelid OR oid =
(SELECT reltoastidxid FROM pg_class WHERE oid = ss.reltoastrelid) ORDER BY
relname;
        relname         | relpages
------------------------+----------
 pg_toast_1066282       |        0
 pg_toast_1066282_index |        1
 pg_toast_1066371       |        0
 pg_toast_1066371_index |        1
(4 rows)

Time: 2,916 ms
# select avg(pg_column_size(t)) from low_level."counters_xxx" t;
          avg
-----------------------
 4306.9017285869340282
(1 row)

Time: 48389,660 ms
# select count(*), _array columns_, avg(octet_length(node)),
avg(octet_length(object)) from low_level."counters_xxx";
 count  |         avg         |         avg         |         avg         |    
avg         |         avg         |         avg         |        
avg         |         avg         |         avg         |        
avg         |         avg         |         avg         |        
avg         |         avg         |        avg         |        avg
--------+---------------------+---------------------+---------------------+-
 236783 | 29.0000000000000000 | 29.0000000000000000 | 39.0000000000000000 |
32.0000000000000000 | 29.0000000000000000 | 31.0000000000000000 |
28.0000000000000000 | 28.0000000000000000 | 28.0000000000000000 |
38.0000000000000000 | 24.0000000000000000 | 24.0000000000000000 |
34.0000000000000000 | 35.0000000000000000 | 5.0000000000000000 |
6.5941600537200728
(1 row)

Time: 7635,558 ms




Example table definition for one of the giant tables (note: everything except
date_start, date_end, node and "object" is inherited from counters_x):
CREATE TABLE low_level."counters_xxx"
(
  date_start timestamp(0) without time zone NOT NULL,
  date_end timestamp(0) without time zone NOT NULL,
  node character varying(16),
  "object" character varying(128),
  aa bigint,
  ab bigint,
  ac bigint,
  ad bigint,
  af bigint,
  ag bigint,
  ah bigint,
  ai bigint,
  aj bigint,
  ak bigint,
  al bigint,
  am bigint,
  an bigint,
  ao bigint,
  ap bigint,
  aq bigint,
  ar bigint[],
  as bigint[],
  at bigint,
  au bigint,
  av bigint,
  aw bigint,
  ax bigint,
  ay bigint,
  az bigint,
  ba bigint,
  bb bigint,
  bc bigint,
  bd bigint,
  be bigint,
  bf bigint,
  bg bigint,
  bh bigint,
  bi bigint,
  bj bigint,
  bk bigint,
  bl bigint,
  bm bigint,
  bn bigint,
  bo bigint,
  bp bigint,
  bq bigint,
  br bigint,
  bs bigint[],
  bt bigint[],
  bu bigint[],
  bv bigint[],
  bw bigint[],
  bx bigint,
  by bigint,
  bz bigint,
  ca bigint,
  cb bigint,
  cc bigint,
  cd bigint,
  ce bigint,
  cf bigint,
  cg bigint,
  ch bigint,
  ci bigint,
  cj bigint,
  ck bigint,
  cl bigint,
  cm bigint,
  cn bigint,
  co bigint,
  cp bigint,
  cq bigint,
  cr bigint,
  cs bigint,
  ct bigint,
  cu bigint,
  cv bigint,
  cw bigint,
  cx bigint,
  cy bigint,
  cz bigint,
  da bigint,
  db bigint,
  dc bigint,
  dd bigint,
  de bigint,
  df bigint,
  dg bigint,
  dh bigint,
  di bigint,
  dj bigint,
  dk bigint,
  dl bigint,
  dm bigint,
  dn bigint,
  do bigint,
  dp bigint,
  dq bigint,
  dr bigint,
  ds bigint,
  dt bigint,
  du bigint,
  dv bigint,
  dw bigint,
  dx bigint,
  dy bigint,
  dz bigint,
  ea bigint,
  eb bigint,
  ec bigint,
  ed bigint,
  ee bigint,
  ef bigint,
  eg bigint,
  eh bigint,
  ei bigint,
  ej bigint,
  ek bigint,
  el bigint,
  em bigint,
  en bigint,
  eo bigint,
  ep bigint,
  eq bigint,
  er bigint,
  es bigint,
  et bigint,
  eu bigint,
  ev bigint,
  ew bigint,
  ex bigint,
  ey bigint,
  ez bigint,
  fa bigint,
  fb bigint,
  fc bigint,
  fd bigint,
  fe bigint,
  ff bigint,
  fg bigint,
  fh bigint,
  fi bigint,
  fj bigint,
  fk bigint,
  fl bigint,
  fm bigint,
  fn bigint,
  fo bigint,
  fp bigint,
  fq bigint,
  fr bigint,
  fs bigint,
  ft bigint,
  fu bigint,
  fv bigint,
  fw bigint,
  fx bigint,
  fy bigint,
  fz bigint,
  ga bigint,
  gb bigint,
  gc bigint,
  gd bigint,
  ge bigint,
  gf bigint,
  gg bigint,
  gh bigint,
  gi bigint,
  gj bigint,
  gk bigint,
  gl bigint,
  gm bigint,
  gn bigint,
  go bigint,
  gp bigint,
  gq bigint,
  gr bigint,
  gs bigint,
  gt bigint,
  gu bigint,
  gv bigint,
  gw bigint,
  gx bigint,
  gy bigint,
  gz bigint,
  ha bigint,
  hb bigint,
  hc bigint,
  hd bigint,
  he bigint,
  hf bigint,
  hg bigint,
  hh bigint,
  hi bigint,
  hj bigint,
  hk bigint,
  hl bigint,
  hm bigint,
  hn bigint,
  ho bigint,
  hp bigint,
  hq bigint,
  hr bigint,
  hs bigint,
  ht bigint,
  hu bigint,
  hv bigint,
  hw bigint,
  hx bigint,
  hy bigint,
  hz bigint,
  ia bigint,
  ib bigint,
  ic bigint,
  id bigint,
  ie bigint,
  if bigint,
  ig bigint,
  ih bigint,
  ii bigint,
  ij bigint,
  ik bigint,
  il bigint,
  im bigint,
  in bigint,
  io bigint,
  ip bigint,
  iq bigint,
  ir bigint,
  is bigint,
  it bigint,
  iu bigint,
  iv bigint,
  iw bigint,
  ix bigint,
  iy bigint,
  iz bigint,
  ja bigint,
  jb bigint,
  jc bigint,
  jd bigint,
  je bigint,
  jf bigint,
  jg bigint,
  jh bigint,
  ji bigint,
  jj bigint,
  jk bigint,
  jl bigint,
  jm bigint,
  jn bigint,
  jo bigint,
  jp bigint,
  jq bigint,
  jr bigint,
  js bigint,
  jt bigint,
  ju bigint,
  jv bigint,
  jw bigint,
  jx bigint,
  jy bigint,
  jz bigint,
  ka bigint,
  kb bigint,
  kc bigint,
  kd bigint,
  ke bigint,
  kf bigint,
  kg bigint,
  kh bigint,
  ki bigint,
  kj bigint,
  kk bigint,
  kl bigint,
  km bigint,
  kn bigint,
  ko bigint,
  kp bigint,
  kq bigint,
  kr bigint,
  ks bigint,
  kt bigint,
  ku bigint,
  kv bigint,
  kw bigint,
  kx bigint,
  ky bigint,
  kz bigint,
  la bigint,
  lb bigint,
  lc bigint,
  ld bigint,
  le bigint,
  lf bigint,
  lg bigint,
  lh bigint,
  li bigint,
  lj bigint,
  lk bigint,
  ll bigint,
  lm bigint,
  ln bigint,
  lo bigint,
  lp bigint,
  lq bigint,
  lr bigint,
  ls bigint,
  lt bigint,
  lu bigint,
  lv bigint,
  lw bigint,
  lx bigint,
  ly bigint,
  lz bigint,
  ma bigint,
  mb bigint,
  mc bigint,
  md bigint,
  me bigint,
  mf bigint,
  mg bigint,
  mh bigint,
  mi bigint,
  mj bigint,
  mk bigint,
  ml bigint,
  mm bigint,
  mn bigint,
  mo bigint,
  mp bigint,
  mq bigint,
  mr bigint,
  ms bigint,
  mt bigint,
  mu bigint,
  mv bigint,
  mw bigint,
  mx bigint,
  my bigint,
  mz bigint,
  na bigint,
  nb bigint,
  nc bigint,
  nd bigint,
  ne bigint,
  nf bigint,
  ng bigint,
  nh bigint,
  ni bigint,
  nj bigint,
  nk bigint,
  nl bigint,
  nm bigint,
  nn bigint,
  no bigint[],
  np bigint[],
  nq bigint[],
  nr bigint,
  ns bigint,
  nt bigint,
  nu bigint,
  nv bigint,
  nw bigint,
  nx bigint,
  ny bigint,
  nz bigint,
  oa bigint,
  ob bigint,
  oc bigint,
  od bigint,
  oe bigint,
  of bigint,
  og bigint,
  oh bigint,
  oi bigint,
  oj bigint,
  ok bigint,
  ol bigint,
  om bigint,
  on bigint,
  oo bigint,
  op bigint,
  oq bigint,
  or bigint,
  os bigint,
  ot bigint,
  ou bigint,
  ov bigint,
  ow bigint,
  ox bigint,
  oy bigint,
  oz bigint,
  pa bigint,
  pb bigint,
  pc bigint,
  pd bigint,
  pe bigint,
  pf bigint,
  pg bigint,
  ph bigint,
  pi bigint,
  pj bigint,
  pk bigint,
  pl bigint,
  pm bigint,
  pn bigint,
  po bigint,
  pp bigint,
  pq bigint,
  pr bigint,
  ps bigint,
  pt bigint,
  pu bigint,
  pv bigint,
  pw bigint,
  px bigint,
  py bigint,
  pz bigint,
  qa bigint,
  qb bigint,
  qc bigint,
  qd bigint,
  qe bigint,
  qf bigint[],
  qg bigint[],
  qh bigint[],
  qi bigint,
  qj bigint,
  qk bigint,
  ql bigint,
  qm bigint,
  qn bigint,
  qo bigint,
  qp bigint,
  qq bigint,
  qr bigint,
  qs bigint,
  qt bigint,
  qu bigint,
  qv bigint,
  qw bigint,
  qx bigint,
  qy bigint,
  qz bigint,
  ra bigint,
  rb bigint,
  rc bigint,
  rd bigint,
  re bigint,
  rf bigint,
  rg bigint,
  rh bigint,
  ri bigint,
  rj bigint,
  rk bigint,
  rl bigint,
  rm bigint,
  rn bigint,
  ro bigint,
  rp bigint,
  rq bigint,
  rr bigint,
  rs bigint,
  rt bigint,
  ru bigint,
  rv bigint,
  rw bigint,
  rx bigint,
  ry bigint,
  rz bigint[],
  sa bigint,
  sb bigint,
  sc bigint,
  sd bigint,
  se bigint,
  sf bigint,
  sg bigint,
  sh bigint,
  si bigint,
  sj bigint,
  sk bigint,
  sl bigint,
  sm bigint,
  sn bigint,
  so bigint,
  sp bigint,
  sq bigint,
  sr bigint,
  ss bigint,
  st bigint,
  su bigint,
  sv bigint,
  sw bigint,
  sx bigint,
  sy bigint,
  sz bigint,
  ta bigint,
  tb bigint,
  tc bigint,
  td bigint,
  te bigint,
  tf bigint,
  tg bigint,
  th bigint,
  ti bigint,
  tj bigint,
  tk bigint,
  tl bigint,
  tm bigint,
  tn bigint,
  to bigint,
  tp bigint,
  tq bigint,
  tr bigint,
  ts bigint,
  CONSTRAINT "counters_xxx_date_end_check" CHECK (date_end
>= '2010-05-06'::date AND date_end < '2010-05-07'::date)
)
-- INHERITS ("counters_x")
WITH (OIDS=FALSE);
CREATE INDEX "low_level.counters_xxx_noidx"
  ON low_level."counters_xxx"
  USING btree
  (node, object);
CREATE INDEX "low_level.counters_xxx_sddidx"
  ON low_level."counters_xxx"
  USING btree
  (date_start, date_end);
CREATE UNIQUE INDEX "low_level.counters_xxx_unique"
  ON low_level."counters_xxx"
  USING btree
  (digest(((COALESCE(node, ''::character varying)::text ||
COALESCE(object, ''::character varying)::text) || date_start) ||
date_end, 'sha1'::text));


Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2010-05-12 20:17:18
Subject: Re: Giant TOAST tables due to many almost empty pages
Previous:From: Greg Sabino MullaneDate: 2010-05-12 18:05:06
Subject: Re: Bug report (#5456) not showing up on the ML

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group