Обсуждение: Giant TOAST tables due to many almost empty pages

Поиск
Список
Период
Сортировка

Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
Hi!

I'm running 8.4.3 (the exact same problem was also present on 8.4.2) installed
from rpm packages at http://yum.pgsqlrpms.org/ on CentOS 5.4 (x86_64).

I have experienced a bit of a problem with my DB's storage and upon further
investigation, noticed that only some (2 for each day of data) of my tables
(hundreds of table per day) are many times bigger than they should be.

From what I can tell (have only checked a few tables), the main difference
between these giant tables and others seems to be, that there are 10+ array
type columns, while others have less (also all tables contain quite a few
non-toastable int8 and/or float8 columns + 2x timestamp + 2 x varchar).
The affected tables get many inserts (~5 concurrent COPY statements) throughout
the day (with ~10min breaks inbetween), but there are no updates and nothing
gets deleted (other than by dropping the table).
VACUUM FULL and CLUSTER do not change the size and the only way to decrease the
size is by e.g. "CREATE TABLE newtable AS TABLE gianttable" which decreases
the size of the table from ~25GB to ~2GB.


Some interesting output (among other things VACUUM FULL and CLUSTER output):
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 32,655 ms
# VACUUM FULL VERBOSE low_level."counters_xxx"; SELECT
pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
INFO:  vacuuming "low_level.counters_xxx"
INFO:  "counters_xxx": found 0 removable, 236783 nonremovable row versions in
236783 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 4288 to 4376 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 913296036 bytes.
0 pages are or will become empty, including 0 at the end of the table.
236783 pages containing 913296036 free bytes are potential move destinations.
CPU 5.13s/1.31u sec elapsed 64.87 sec.
INFO:  index "low_level.counters_xxx_unique" now contains 236783 row versions
in 1531 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 1.52 sec.
INFO:  index "low_level.counters_xxx_sddidx" now contains 236783 row versions
in 1127 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.00u sec elapsed 1.12 sec.
INFO:  index "low_level.counters_xxx_noidx" now contains 236783 row versions in
1227 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.98 sec.
INFO:  "counters_xxx": moved 0 row versions, truncated 236783 to 236783 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_1066371"
INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row versions
in 3259181 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 57 to 122 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 69.63s/17.97u sec elapsed 547.98 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in 8938
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.02u sec elapsed 7.35 sec.
VACUUM
Time: 624228,644 ms
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 30,779 ms
# VACUUM FULL VERBOSE pg_toast.pg_toast_1066371;
INFO:  vacuuming "pg_toast.pg_toast_1066371"
INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row versions
in 3259181 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 57 to 122 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 31.55s/9.33u sec elapsed 296.60 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in 8938
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.01u sec elapsed 0.76 sec.
VACUUM
Time: 297441,342 ms
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
pg_size_pretty
----------------
 27 GB
(1 row)

Time: 48,422 ms
# SELECT pg_size_pretty(pg_total_relation_size('pg_toast.pg_toast_1066371'));
pg_size_pretty
----------------
 25 GB
(1 row)

Time: 4,816 ms
# CLUSTER VERBOSE low_level."counters_xxx"
USING "low_level.counters_xxx_noidx";
INFO:  clustering "low_level.counters_xxx"
CLUSTER
Time: 4571708,940 ms
# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
 pg_size_pretty
----------------
 27 GB
(1 row)

Time: 10,046 ms
# CREATE TABLE newtable AS TABLE low_level."counters_xxx"; SELECT
pg_size_pretty(pg_total_relation_size('newtable'));
SELECT
Time: 1025617,010 ms
 pg_size_pretty
----------------
 2145 MB
(1 row)

Time: 3,415 ms
# SELECT relname, relpages FROM pg_class, (SELECT reltoastrelid FROM pg_class
WHERE relname = 'counters_xxx') ss WHERE oid = ss.reltoastrelid OR oid =
(SELECT reltoastidxid FROM pg_class WHERE oid = ss.reltoastrelid) ORDER BY
relname;
        relname         | relpages
------------------------+----------
 pg_toast_1066282       |        0
 pg_toast_1066282_index |        1
 pg_toast_1066371       |        0
 pg_toast_1066371_index |        1
(4 rows)

Time: 2,916 ms
# select avg(pg_column_size(t)) from low_level."counters_xxx" t;
          avg
-----------------------
 4306.9017285869340282
(1 row)

Time: 48389,660 ms
# select count(*), _array columns_, avg(octet_length(node)),
avg(octet_length(object)) from low_level."counters_xxx";
 count  |         avg         |         avg         |         avg         |
avg         |         avg         |         avg         |
avg         |         avg         |         avg         |
avg         |         avg         |         avg         |
avg         |         avg         |        avg         |        avg
--------+---------------------+---------------------+---------------------+-
 236783 | 29.0000000000000000 | 29.0000000000000000 | 39.0000000000000000 |
32.0000000000000000 | 29.0000000000000000 | 31.0000000000000000 |
28.0000000000000000 | 28.0000000000000000 | 28.0000000000000000 |
38.0000000000000000 | 24.0000000000000000 | 24.0000000000000000 |
34.0000000000000000 | 35.0000000000000000 | 5.0000000000000000 |
6.5941600537200728
(1 row)

Time: 7635,558 ms




Example table definition for one of the giant tables (note: everything except
date_start, date_end, node and "object" is inherited from counters_x):
CREATE TABLE low_level."counters_xxx"
(
  date_start timestamp(0) without time zone NOT NULL,
  date_end timestamp(0) without time zone NOT NULL,
  node character varying(16),
  "object" character varying(128),
  aa bigint,
  ab bigint,
  ac bigint,
  ad bigint,
  af bigint,
  ag bigint,
  ah bigint,
  ai bigint,
  aj bigint,
  ak bigint,
  al bigint,
  am bigint,
  an bigint,
  ao bigint,
  ap bigint,
  aq bigint,
  ar bigint[],
  as bigint[],
  at bigint,
  au bigint,
  av bigint,
  aw bigint,
  ax bigint,
  ay bigint,
  az bigint,
  ba bigint,
  bb bigint,
  bc bigint,
  bd bigint,
  be bigint,
  bf bigint,
  bg bigint,
  bh bigint,
  bi bigint,
  bj bigint,
  bk bigint,
  bl bigint,
  bm bigint,
  bn bigint,
  bo bigint,
  bp bigint,
  bq bigint,
  br bigint,
  bs bigint[],
  bt bigint[],
  bu bigint[],
  bv bigint[],
  bw bigint[],
  bx bigint,
  by bigint,
  bz bigint,
  ca bigint,
  cb bigint,
  cc bigint,
  cd bigint,
  ce bigint,
  cf bigint,
  cg bigint,
  ch bigint,
  ci bigint,
  cj bigint,
  ck bigint,
  cl bigint,
  cm bigint,
  cn bigint,
  co bigint,
  cp bigint,
  cq bigint,
  cr bigint,
  cs bigint,
  ct bigint,
  cu bigint,
  cv bigint,
  cw bigint,
  cx bigint,
  cy bigint,
  cz bigint,
  da bigint,
  db bigint,
  dc bigint,
  dd bigint,
  de bigint,
  df bigint,
  dg bigint,
  dh bigint,
  di bigint,
  dj bigint,
  dk bigint,
  dl bigint,
  dm bigint,
  dn bigint,
  do bigint,
  dp bigint,
  dq bigint,
  dr bigint,
  ds bigint,
  dt bigint,
  du bigint,
  dv bigint,
  dw bigint,
  dx bigint,
  dy bigint,
  dz bigint,
  ea bigint,
  eb bigint,
  ec bigint,
  ed bigint,
  ee bigint,
  ef bigint,
  eg bigint,
  eh bigint,
  ei bigint,
  ej bigint,
  ek bigint,
  el bigint,
  em bigint,
  en bigint,
  eo bigint,
  ep bigint,
  eq bigint,
  er bigint,
  es bigint,
  et bigint,
  eu bigint,
  ev bigint,
  ew bigint,
  ex bigint,
  ey bigint,
  ez bigint,
  fa bigint,
  fb bigint,
  fc bigint,
  fd bigint,
  fe bigint,
  ff bigint,
  fg bigint,
  fh bigint,
  fi bigint,
  fj bigint,
  fk bigint,
  fl bigint,
  fm bigint,
  fn bigint,
  fo bigint,
  fp bigint,
  fq bigint,
  fr bigint,
  fs bigint,
  ft bigint,
  fu bigint,
  fv bigint,
  fw bigint,
  fx bigint,
  fy bigint,
  fz bigint,
  ga bigint,
  gb bigint,
  gc bigint,
  gd bigint,
  ge bigint,
  gf bigint,
  gg bigint,
  gh bigint,
  gi bigint,
  gj bigint,
  gk bigint,
  gl bigint,
  gm bigint,
  gn bigint,
  go bigint,
  gp bigint,
  gq bigint,
  gr bigint,
  gs bigint,
  gt bigint,
  gu bigint,
  gv bigint,
  gw bigint,
  gx bigint,
  gy bigint,
  gz bigint,
  ha bigint,
  hb bigint,
  hc bigint,
  hd bigint,
  he bigint,
  hf bigint,
  hg bigint,
  hh bigint,
  hi bigint,
  hj bigint,
  hk bigint,
  hl bigint,
  hm bigint,
  hn bigint,
  ho bigint,
  hp bigint,
  hq bigint,
  hr bigint,
  hs bigint,
  ht bigint,
  hu bigint,
  hv bigint,
  hw bigint,
  hx bigint,
  hy bigint,
  hz bigint,
  ia bigint,
  ib bigint,
  ic bigint,
  id bigint,
  ie bigint,
  if bigint,
  ig bigint,
  ih bigint,
  ii bigint,
  ij bigint,
  ik bigint,
  il bigint,
  im bigint,
  in bigint,
  io bigint,
  ip bigint,
  iq bigint,
  ir bigint,
  is bigint,
  it bigint,
  iu bigint,
  iv bigint,
  iw bigint,
  ix bigint,
  iy bigint,
  iz bigint,
  ja bigint,
  jb bigint,
  jc bigint,
  jd bigint,
  je bigint,
  jf bigint,
  jg bigint,
  jh bigint,
  ji bigint,
  jj bigint,
  jk bigint,
  jl bigint,
  jm bigint,
  jn bigint,
  jo bigint,
  jp bigint,
  jq bigint,
  jr bigint,
  js bigint,
  jt bigint,
  ju bigint,
  jv bigint,
  jw bigint,
  jx bigint,
  jy bigint,
  jz bigint,
  ka bigint,
  kb bigint,
  kc bigint,
  kd bigint,
  ke bigint,
  kf bigint,
  kg bigint,
  kh bigint,
  ki bigint,
  kj bigint,
  kk bigint,
  kl bigint,
  km bigint,
  kn bigint,
  ko bigint,
  kp bigint,
  kq bigint,
  kr bigint,
  ks bigint,
  kt bigint,
  ku bigint,
  kv bigint,
  kw bigint,
  kx bigint,
  ky bigint,
  kz bigint,
  la bigint,
  lb bigint,
  lc bigint,
  ld bigint,
  le bigint,
  lf bigint,
  lg bigint,
  lh bigint,
  li bigint,
  lj bigint,
  lk bigint,
  ll bigint,
  lm bigint,
  ln bigint,
  lo bigint,
  lp bigint,
  lq bigint,
  lr bigint,
  ls bigint,
  lt bigint,
  lu bigint,
  lv bigint,
  lw bigint,
  lx bigint,
  ly bigint,
  lz bigint,
  ma bigint,
  mb bigint,
  mc bigint,
  md bigint,
  me bigint,
  mf bigint,
  mg bigint,
  mh bigint,
  mi bigint,
  mj bigint,
  mk bigint,
  ml bigint,
  mm bigint,
  mn bigint,
  mo bigint,
  mp bigint,
  mq bigint,
  mr bigint,
  ms bigint,
  mt bigint,
  mu bigint,
  mv bigint,
  mw bigint,
  mx bigint,
  my bigint,
  mz bigint,
  na bigint,
  nb bigint,
  nc bigint,
  nd bigint,
  ne bigint,
  nf bigint,
  ng bigint,
  nh bigint,
  ni bigint,
  nj bigint,
  nk bigint,
  nl bigint,
  nm bigint,
  nn bigint,
  no bigint[],
  np bigint[],
  nq bigint[],
  nr bigint,
  ns bigint,
  nt bigint,
  nu bigint,
  nv bigint,
  nw bigint,
  nx bigint,
  ny bigint,
  nz bigint,
  oa bigint,
  ob bigint,
  oc bigint,
  od bigint,
  oe bigint,
  of bigint,
  og bigint,
  oh bigint,
  oi bigint,
  oj bigint,
  ok bigint,
  ol bigint,
  om bigint,
  on bigint,
  oo bigint,
  op bigint,
  oq bigint,
  or bigint,
  os bigint,
  ot bigint,
  ou bigint,
  ov bigint,
  ow bigint,
  ox bigint,
  oy bigint,
  oz bigint,
  pa bigint,
  pb bigint,
  pc bigint,
  pd bigint,
  pe bigint,
  pf bigint,
  pg bigint,
  ph bigint,
  pi bigint,
  pj bigint,
  pk bigint,
  pl bigint,
  pm bigint,
  pn bigint,
  po bigint,
  pp bigint,
  pq bigint,
  pr bigint,
  ps bigint,
  pt bigint,
  pu bigint,
  pv bigint,
  pw bigint,
  px bigint,
  py bigint,
  pz bigint,
  qa bigint,
  qb bigint,
  qc bigint,
  qd bigint,
  qe bigint,
  qf bigint[],
  qg bigint[],
  qh bigint[],
  qi bigint,
  qj bigint,
  qk bigint,
  ql bigint,
  qm bigint,
  qn bigint,
  qo bigint,
  qp bigint,
  qq bigint,
  qr bigint,
  qs bigint,
  qt bigint,
  qu bigint,
  qv bigint,
  qw bigint,
  qx bigint,
  qy bigint,
  qz bigint,
  ra bigint,
  rb bigint,
  rc bigint,
  rd bigint,
  re bigint,
  rf bigint,
  rg bigint,
  rh bigint,
  ri bigint,
  rj bigint,
  rk bigint,
  rl bigint,
  rm bigint,
  rn bigint,
  ro bigint,
  rp bigint,
  rq bigint,
  rr bigint,
  rs bigint,
  rt bigint,
  ru bigint,
  rv bigint,
  rw bigint,
  rx bigint,
  ry bigint,
  rz bigint[],
  sa bigint,
  sb bigint,
  sc bigint,
  sd bigint,
  se bigint,
  sf bigint,
  sg bigint,
  sh bigint,
  si bigint,
  sj bigint,
  sk bigint,
  sl bigint,
  sm bigint,
  sn bigint,
  so bigint,
  sp bigint,
  sq bigint,
  sr bigint,
  ss bigint,
  st bigint,
  su bigint,
  sv bigint,
  sw bigint,
  sx bigint,
  sy bigint,
  sz bigint,
  ta bigint,
  tb bigint,
  tc bigint,
  td bigint,
  te bigint,
  tf bigint,
  tg bigint,
  th bigint,
  ti bigint,
  tj bigint,
  tk bigint,
  tl bigint,
  tm bigint,
  tn bigint,
  to bigint,
  tp bigint,
  tq bigint,
  tr bigint,
  ts bigint,
  CONSTRAINT "counters_xxx_date_end_check" CHECK (date_end
>= '2010-05-06'::date AND date_end < '2010-05-07'::date)
)
-- INHERITS ("counters_x")
WITH (OIDS=FALSE);
CREATE INDEX "low_level.counters_xxx_noidx"
  ON low_level."counters_xxx"
  USING btree
  (node, object);
CREATE INDEX "low_level.counters_xxx_sddidx"
  ON low_level."counters_xxx"
  USING btree
  (date_start, date_end);
CREATE UNIQUE INDEX "low_level.counters_xxx_unique"
  ON low_level."counters_xxx"
  USING btree
  (digest(((COALESCE(node, ''::character varying)::text ||
COALESCE(object, ''::character varying)::text) || date_start) ||
date_end, 'sha1'::text));

Re: Giant TOAST tables due to many almost empty pages

От
Tom Lane
Дата:
Rumko <rumcic@gmail.com> writes:
> INFO:  vacuuming "pg_toast.pg_toast_1066371"
> INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row versions
> in 3259181 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> Nonremovable row versions range from 57 to 122 bytes long.
> There were 0 unused item pointers.
> Total free space (including removable row versions) is 0 bytes.
> 0 pages are or will become empty, including 0 at the end of the table.
> 0 pages containing 0 free bytes are potential move destinations.

There's something extremely wacko about that vacuum output.  A toast
table should have few, if any, rows that short.  And it's impossible
to believe there's no free space at all in the table, especially since
122*3259181 bytes is still quite a lot less than 3259181 pages.  Are
you running with any nonstandard storage or vacuum parameters?

            regards, tom lane

Re: Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
Tom Lane wrote:
<snip>
>
> There's something extremely wacko about that vacuum output.  A toast
> table should have few, if any, rows that short.  And it's impossible
> to believe there's no free space at all in the table, especially since
> 122*3259181 bytes is still quite a lot less than 3259181 pages.  Are
> you running with any nonstandard storage or vacuum parameters?
>
> regards, tom lane
>

Regarding storage paramaters, you mean ALTER TABLE x SET STORAGE...? Then no.

Only autovacuum settings are changed to be a bit more aggressive, vacuum
settings haven't been touched:
# select name,setting from pg_settings where source not in
('default','override') AND name LIKE '%vacuum%';
              name              | setting
--------------------------------+---------
 autovacuum_max_workers         | 4
 autovacuum_naptime             | 45
 autovacuum_vacuum_cost_delay   | 15
 autovacuum_vacuum_cost_limit   | 2000
 autovacuum_vacuum_scale_factor | 0.1
 autovacuum_vacuum_threshold    | 75
(6 rows)

Re: Giant TOAST tables due to many almost empty pages

От
Tom Lane
Дата:
Rumko <rumcic@gmail.com> writes:
> Tom Lane wrote:
>> There's something extremely wacko about that vacuum output.

> Regarding storage paramaters, you mean ALTER TABLE x SET STORAGE...? Then no.

No, I was wondering about ALTER TABLE ... SET (fillfactor = n).
It would be worth checking to see if you get a nonnull result from
    select reloptions from pg_class where relname = 'pg_toast_1066371';
The funny behavior would be partially explained if the toast table has a
ridiculously small fillfactor --- in particular that would explain
VACUUM claiming there's no free space, as well as the bloat caused by
having only one useful toast row per page.

There's still the question of why it's toasting such short values at
all, but I think I see that: your table rows contain 500 non-toastable
columns, either bigints or timestamps, each of which requires 8 bytes.
So assuming those are all non-null, that's 4000 unremovable bytes right
there.  The toast code then goes nuts trying to push out all the
toastable columns to bring the tuple down to target size; it's going to
push columns to toast that ordinarily wouldn't get pushed.

You might want to think about collapsing all those standalone bigint
columns into an array.

Maybe the toast heuristics should be modified to cope a bit more
gracefully with a case like this.  Pushing out a relatively small column
in order to get down from 4200 to 4100 bytes doesn't seem like a win.
OTOH, this is by no stretch of the imagination a good schema design, so
I'm not sure how excited people will be about making it perform better.

            regards, tom lane

Re: Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
On Thursday 13. of May 2010 17:24:47 Tom Lane wrote:
> Rumko <rumcic@gmail.com> writes:
> > Tom Lane wrote:
> >> There's something extremely wacko about that vacuum output.
> >
> > Regarding storage paramaters, you mean ALTER TABLE x SET STORAGE...? Th=
en
> > no.
>
> No, I was wondering about ALTER TABLE ... SET (fillfactor =3D n).
> It would be worth checking to see if you get a nonnull result from
>     select reloptions from pg_class where relname =3D 'pg_toast_1066371';

Returns NULL.

> The funny behavior would be partially explained if the toast table has a
> ridiculously small fillfactor --- in particular that would explain
> VACUUM claiming there's no free space, as well as the bloat caused by
> having only one useful toast row per page.
>
> There's still the question of why it's toasting such short values at
> all, but I think I see that: your table rows contain 500 non-toastable
> columns, either bigints or timestamps, each of which requires 8 bytes.
> So assuming those are all non-null, that's 4000 unremovable bytes right
> there.  The toast code then goes nuts trying to push out all the
> toastable columns to bring the tuple down to target size; it's going to
> push columns to toast that ordinarily wouldn't get pushed.

This does not bother me, the amount in the toast tables is miniscule and co=
mes=20
up to ~275MB at the end and as far as performance goes, there were no=20
noticable problems (it's quite fast).

>
> You might want to think about collapsing all those standalone bigint
> columns into an array.

The current design is not final yet, but for now it has proven (with the=20
exception of the 2 tables that have giant toast tables) to be the most usef=
ul=20
(administration vs. speed vs. ease of use). There will be more=20
experimentation.

>
> Maybe the toast heuristics should be modified to cope a bit more
> gracefully with a case like this.  Pushing out a relatively small column
> in order to get down from 4200 to 4100 bytes doesn't seem like a win.
> OTOH, this is by no stretch of the imagination a good schema design, so
> I'm not sure how excited people will be about making it perform better.
>
>             regards, tom lane

As far as I'm concerned, the TOAST table itself does not bother me even if =
I=20
have a few bytes per row there, only the part where VACUUM claims no free=
=20
space even though pages are more empty than not.

From what I can tell, the problem seems to be in the fsm? Used pg_freespace=
=20
from the pg_freespacemap module and it claims that there are no pages in th=
e=20
toast table that have any free space left (on the other hand vacuum shows=
=20
that each page has a max of 122 bytes of data ... so there should still be=
=20
~8000 bytes of free space left, right?). I tested this on a table that I=20
already ran VACUUM FULL and CLUSTER on it and on a table that I didn't, but=
=20
for both, pg_freespace claimed that all pages were full for the toast table.
--=20
Regards,
Rumko

Re: Giant TOAST tables due to many almost empty pages

От
Tom Lane
Дата:
Rumko <rumcic@gmail.com> writes:
> On Thursday 13. of May 2010 17:24:47 Tom Lane wrote:
>> You might want to think about collapsing all those standalone bigint
>> columns into an array.

> The current design is not final yet, but for now it has proven (with the
> exception of the 2 tables that have giant toast tables) to be the most useful
> (administration vs. speed vs. ease of use). There will be more
> experimentation.

Well, as long as you aren't accessing any of the variable-width columns
often, the performance hit of having them out-of-line isn't going to be
a problem for you.

> As far as I'm concerned, the TOAST table itself does not bother me even if I
> have a few bytes per row there, only the part where VACUUM claims no free
> space even though pages are more empty than not.

Yeah, that's the still-unexplained part.  It is certainly acting like
there's a very small fillfactor setting for that toast table :-(.
Don't understand where that's coming from.  Is this happening for
more than one table?

> From what I can tell, the problem seems to be in the fsm?

No.  What VACUUM is printing is from direct inspection of the table,
it's not gone through the fsm.  There is certainly free space on each
toast table page given the reported tuple sizes, but seemingly the
free space is less than what it thinks it should reserve for fillfactor;
that would cause VACUUM to report the free space as zero.

Do *any* of the rows in pg_class have non-null reloptions?

            regards, tom lane

Re: Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
On Thursday 13. of May 2010 21:43:37 Tom Lane wrote:
> Rumko <rumcic@gmail.com> writes:
> > As far as I'm concerned, the TOAST table itself does not bother me even
> > if I have a few bytes per row there, only the part where VACUUM claims =
no
> > free space even though pages are more empty than not.
>
> Yeah, that's the still-unexplained part.  It is certainly acting like
> there's a very small fillfactor setting for that toast table :-(.
> Don't understand where that's coming from.  Is this happening for
> more than one table?

Yes, 2 for each day of data (both with a lot higher column count than other=
=20
tables and both have an extremely high ratio of bloated vs. non-bloated=20
sizes).

>
> > From what I can tell, the problem seems to be in the fsm?
>
> No.  What VACUUM is printing is from direct inspection of the table,
> it's not gone through the fsm.  There is certainly free space on each
> toast table page given the reported tuple sizes, but seemingly the
> free space is less than what it thinks it should reserve for fillfactor;
> that would cause VACUUM to report the free space as zero.
>
> Do *any* of the rows in pg_class have non-null reloptions?

First of all, really sorry.
"select reloptions from pg_class where relname =3D 'pg_toast_1066371';"
Returns "{autovacuum_enabled=3Dfalse}" (a remnant of some testing/playing) =
and=20
not NULL (was looking at the wrong server).

As for the others in pg_class, there is an index for a totally different ta=
ble=20
which has "{fillfactor=3D90}". There are a few more tables (main and toast)=
=20
with "{autovacuum_enabled=3Dfalse}" and that's it (others have NULL).

>
>             regards, tom lane
--=20
Regards,
Rumko

Re: Giant TOAST tables due to many almost empty pages

От
Tom Lane
Дата:
Rumko <rumcic@gmail.com> writes:
> On Thursday 13. of May 2010 21:43:37 Tom Lane wrote:
>> Do *any* of the rows in pg_class have non-null reloptions?

> First of all, really sorry.
> "select reloptions from pg_class where relname = 'pg_toast_1066371';"
> Returns "{autovacuum_enabled=false}" (a remnant of some testing/playing) and
> not NULL (was looking at the wrong server).

> As for the others in pg_class, there is an index for a totally different table
> which has "{fillfactor=90}". There are a few more tables (main and toast)
> with "{autovacuum_enabled=false}" and that's it (others have NULL).

Hmm, do both of the toast tables with bloat problems have
"{autovacuum_enabled=false}" ?

            regards, tom lane

Re: Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
On Friday 14. of May 2010 19:29:44 Tom Lane wrote:
> Rumko <rumcic@gmail.com> writes:
> > On Thursday 13. of May 2010 21:43:37 Tom Lane wrote:
> >> Do *any* of the rows in pg_class have non-null reloptions?
> >
> > First of all, really sorry.
> > "select reloptions from pg_class where relname =3D 'pg_toast_1066371';"
> > Returns "{autovacuum_enabled=3Dfalse}" (a remnant of some testing/playi=
ng)
> > and not NULL (was looking at the wrong server).
> >
> > As for the others in pg_class, there is an index for a totally different
> > table which has "{fillfactor=3D90}". There are a few more tables (main =
and
> > toast) with "{autovacuum_enabled=3Dfalse}" and that's it (others have
> > NULL).
>
> Hmm, do both of the toast tables with bloat problems have
> "{autovacuum_enabled=3Dfalse}" ?

Yeah, but also many others that don't have the problem.

>
>             regards, tom lane
--=20
Regards,
Rumko

Re: Giant TOAST tables due to many almost empty pages

От
Tom Lane
Дата:
Rumko <rumcic@gmail.com> writes:
> On Friday 14. of May 2010 19:29:44 Tom Lane wrote:
>> Hmm, do both of the toast tables with bloat problems have
>> "{autovacuum_enabled=false}" ?

> Yeah, but also many others that don't have the problem.

Hmm, well I can reproduce the problem after doing
alter table foo set (toast.autovacuum_enabled = false);

Haven't investigated exactly why yet, but I imagine that the presence of
a nonnull reloptions setting for a toast table is confusing the code
that determines the default fillfactor.

As a workaround, I'd suggest getting rid of that:
alter table foo reset (toast.autovacuum_enabled);
and then doing VACUUM FULL on the bloated table.

            regards, tom lane

Re: Giant TOAST tables due to many almost empty pages

От
Rumko
Дата:
On Friday 14. of May 2010 20:02:02 Tom Lane wrote:
> Rumko <rumcic@gmail.com> writes:
> > On Friday 14. of May 2010 19:29:44 Tom Lane wrote:
> >> Hmm, do both of the toast tables with bloat problems have
> >> "{autovacuum_enabled=3Dfalse}" ?
> >
> > Yeah, but also many others that don't have the problem.
>
> Hmm, well I can reproduce the problem after doing
> alter table foo set (toast.autovacuum_enabled =3D false);
>
> Haven't investigated exactly why yet, but I imagine that the presence of
> a nonnull reloptions setting for a toast table is confusing the code
> that determines the default fillfactor.
>
> As a workaround, I'd suggest getting rid of that:
> alter table foo reset (toast.autovacuum_enabled);
> and then doing VACUUM FULL on the bloated table.
>
>             regards, tom lane

It works! Thank you!

# SELECT pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'))=
;=20
VACUUM FULL VERBOSE low_level."counters_xxx"; SELECT=20
pg_size_pretty(pg_total_relation_size('low_level."counters_xxx"'));
 pg_size_pretty=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20
----------------=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20
 27 GB=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
(1 row)=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20

Time: 393,410 ms
INFO:  vacuuming "low_level.counters_xxx"
INFO:  "counters_xxx": found 0 removable, 236783 nonremovable row versions =
in=20
236783 pages
DETAIL:  0 dead row versions cannot be removed yet.=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
Nonremovable row versions range from 4288 to 4376 bytes long.=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20
There were 0 unused item pointers.=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
Total free space (including removable row versions) is 913296036 bytes.=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
0 pages are or will become empty, including 0 at the end of the table.=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20
236783 pages containing 913296036 free bytes are potential move destination=
s.=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20
CPU 2.37s/0.69u sec elapsed 79.70 sec.=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20
INFO:  index "low_level.counters_xxx_unique" now contains 236783 row versio=
ns=20
in 1177 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
CPU 0.01s/0.00u sec elapsed 0.38 sec.=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20
INFO:  index "low_level.counters_xxx_sddidx" now contains 236783 row versio=
ns=20
in 915 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.33 sec.
INFO:  index "low_level.counters_xxx_noidx" now contains 236783 row version=
s=20
in 915 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.30 sec.
INFO:  "counters_xxx": moved 0 row versions, truncated 236783 to 236783 pag=
es
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_1066371"
INFO:  "pg_toast_1066371": found 0 removable, 3259181 nonremovable row=20
versions in 3259181 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 57 to 122 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 26386339332 bytes.
0 pages are or will become empty, including 0 at the end of the table.
3259181 pages containing 26386339332 free bytes are potential move=20
destinations.
CPU 33.17s/10.71u sec elapsed 492.24 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in=
=20
8938 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.11s/0.03u sec elapsed 7.61 sec.
INFO:  "pg_toast_1066371": moved 3230346 row versions, truncated 3259181 to=
=20
28835 pages
DETAIL:  CPU 191.61s/95.90u sec elapsed 4789.52 sec.
INFO:  index "pg_toast_1066371_index" now contains 3259181 row versions in=
=20
17798 pages
DETAIL:  3230346 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.59s/2.81u sec elapsed 12.00 sec.
VACUUM
Time: 5383853,256 ms
 pg_size_pretty
----------------
 2238 MB
(1 row)

Time: 78,891 ms
--=20
Regards,
Rumko