Обсуждение: System administration functions about relation size ignore changes in the table structure

Поиск
Список
Период
Сортировка

System administration functions about relation size ignore changes in the table structure

От
Erki Eessaar
Дата:
Hello

According to the example at the end of the letter the functions pg_total_relation_size,  pg_table_size and pg_indexes_size seem to ignore adding or removing columns from a table.

I first noticed it in case of stored generated columns but the same applies to regular columns as well.

I tested it in PostgreSQL 16.

Best regards
Erki Eessaar
*******************************
CREATE TABLE Invoice (invoice_code INTEGER NOT NULL,
client_id INTEGER NOT NULL,
invoice_date DATE NOT NULL,
invoice_year SMALLINT GENERATED ALWAYS AS (extract(year FROM invoice_date)) STORED NOT NULL,
invoice_month SMALLINT GENERATED ALWAYS AS (extract(month FROM invoice_date)) STORED NOT NULL,
CONSTRAINT pk_invoice PRIMARY KEY (invoice_code));

/*I generate 15000 rows.*/
INSERT INTO Invoice (invoice_code, client_id, invoice_date)
SELECT generator AS invoice_code,
floor(random() * (1_000 - 1 + 1) + 1)::int AS client_id,
'2015-01-01'::date + floor(random() * (current_date - '2015-01-01' + 1) + 1)::int AS invoice_date
FROM generate_series(1, 15_000) AS generator;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 1056768

SELECT pg_table_size('invoice'::regclass)+pg_indexes_size('invoice'::regclass) AS reslt;
--Result: 1056768

ALTER TABLE Invoice DROP invoice_year;
ALTER TABLE Invoice DROP invoice_month;
VACUUM ANALYZE;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 1056768

SELECT pg_table_size('invoice'::regclass)+pg_indexes_size('invoice'::regclass) AS reslt;
--Result: 1056768

I removed stored(!) generated columns but the size does not change.

DROP TABLE Invoice;

CREATE TABLE Invoice (invoice_code INTEGER NOT NULL,
client_id INTEGER NOT NULL,
invoice_date DATE NOT NULL);

INSERT INTO Invoice (invoice_code, client_id, invoice_date)
SELECT generator AS invoice_code,
floor(random() * (1_000 - 1 + 1) + 1)::int AS client_id,
'2015-01-01'::date + floor(random() * (current_date - '2015-01-01' + 1) + 1)::int AS invoice_date
FROM generate_series(1, 15_000) AS generator;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 704512

SELECT pg_table_size('invoice'::regclass)+pg_indexes_size('invoice'::regclass) AS reslt;
--Result: 704512

ALTER TABLE Invoice ADD COLUMN invoice_year SMALLINT GENERATED ALWAYS AS (extract(year FROM invoice_date)) STORED NOT NULL;
ALTER TABLE Invoice ADD COLUMN invoice_month SMALLINT GENERATED ALWAYS AS (extract(month FROM invoice_date)) STORED NOT NULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 704512

SELECT pg_table_size('invoice'::regclass)+pg_indexes_size('invoice'::regclass) AS reslt;
--Result: 704512

I added stored(!) generated columns but the size does not change.

ALTER TABLE Invoice DROP COLUMN invoice_month;

ALTER TABLE Invoice DROP COLUMN invoice_year;

ALTER TABLE Invoice DROP COLUMN invoice_date;

VACUUM ANALYZE;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 704512

SELECT pg_table_size('invoice'::regclass)+pg_indexes_size('invoice'::regclass) AS reslt;
--Result: 704512

The size is still the same.

Re: System administration functions about relation size ignore changes in the table structure

От
Tom Lane
Дата:
Erki Eessaar <erki.eessaar@taltech.ee> writes:
> According to the example at the end of the letter the functions pg_total_relation_size,  pg_table_size and
pg_indexes_sizeseem to ignore adding or removing columns from a table. 

I don't think this is a bug, it's more like a poor test case designed
with no understanding of Postgres internals.

1. Dropping a column doesn't ever produce an instantaneous reduction
of table size.  The column is still there physically.  We'll replace
it with NULLs over time as existing rows are updated.

2. Adding a column may not cause an instantaneous increase in size
either, if we can avoid an immediate table rewrite; which is true
if the column has a null or constant default value.

3. Even for your example with adding a generated column, you would
only see an increase in table size if we could now fit fewer rows per
page than before.  Your test case seems to involve comparing a table
with 3 integer-size columns to one with 3 integers and 2 smallints.
On 64-bit machines those come out to the same size (16 data bytes per
row) because the row length is padded to a multiple of 8 bytes for
alignment reasons.

            regards, tom lane



Re: System administration functions about relation size ignore changes in the table structure

От
Erki Eessaar
Дата:
Hello

Thank you. However, I made another small experiment (see the code in the end) and I still have questions.

I created a table with a textual column and generated 15_000 rows of test data. After that I added a generated column based on the textual column and indeed the relation size increased.

After that I dropped ALL the columns of the table (by the way it is a nice feature that PostgreSQL allows tables without columns). VACUUM ANALYZE did not change the relation size. VACUUM FULL reduced the size, but it was still pretty large.

After that I tried to add a mandatory (NOT NULL) column to the table (with a new name). It failed!!

After that I tried to add an optional (permit NULLs) column to the table. It succeeded but the table now had 15_000 rows with NULLs.

Then I added a small amount of test data. It increased the relation size a bit.

I want to understand is it an expected behavior? 

Is it possible somehow to reorganize pages so that the pages that were once used by the relation would be released? 

The border case with evolving the table structure produces a result that from the point of view of database user is a bug.

Best regards
Erki Eessaar
******************************
DROP TABLE IF EXISTS Invoice;

CREATE TABLE Invoice (invoice_code INTEGER NOT NULL,
client_id INTEGER NOT NULL,
invoice_date DATE NOT NULL,
total_sum INTEGER NOT NULL,
description TEXT NOT NULL,
CONSTRAINT pk_invoice PRIMARY KEY (invoice_code));

INSERT INTO Invoice (invoice_code, client_id, invoice_date, total_sum, description)
SELECT generator AS invoice_code,
floor(random() * (1_000 - 1 + 1) + 1)::int AS client_id,
'2015-01-01'::date + floor(random() * (current_date - '2015-01-01' + 1) + 1)::int AS invoice_date,
floor(random() * (40_000-1+1) + 1)::integer AS total_sum,
(SELECT string_agg(x,', ') AS tulemus
FROM (SELECT arr[floor(random() * (8-1+1) + 1)] AS x
FROM (SELECT '{important, awful, irrelevant, quick, paper-based, electronic, top-priority, cheating, very big}'::text[] AS arr) AS foo,
generate_series(1,floor(random() * (10-1+1) +
1)::integer + (generator*0)) AS g) AS f) AS description
FROM generate_series(1, 15_000) AS generator;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 1966080


ALTER TABLE Invoice ADD COLUMN description_upper TEXT GENERATED ALWAYS AS (Upper(description)) STORED NOT NULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
---Result: 2981888

ALTER TABLE Invoice DROP COLUMN description_upper;
ALTER TABLE Invoice DROP COLUMN description;
ALTER TABLE Invoice DROP COLUMN invoice_date;
ALTER TABLE Invoice DROP COLUMN client_id;
ALTER TABLE Invoice DROP COLUMN total_sum;
ALTER TABLE Invoice DROP COLUMN invoice_code CASCADE;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 425984


ALTER TABLE Invoice ADD COLUMN invoice_id SMALLINT NOT NULL;
--ERROR:  column "invoice_id" of relation "invoice" contains null values

ALTER TABLE Invoice ADD COLUMN invoice_id SMALLINT;

SELECT Count(*) AS cnt FROM Invoice;
--Table has 15000 rows where invoice_id is NULL

INSERT INTO Invoice (invoice_id)
SELECT generator AS invoice_id
FROM generate_series(1, 1_000) AS generator;

VACUUM FULL;

SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 458752

DELETE FROM Invoice;
VACUUM FULL;
SELECT pg_total_relation_size('invoice'::regclass) AS reslt;
--Result: 0

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, October 14, 2023 20:02
To: Erki Eessaar <erki.eessaar@taltech.ee>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: System administration functions about relation size ignore changes in the table structure
 
Erki Eessaar <erki.eessaar@taltech.ee> writes:
> According to the example at the end of the letter the functions pg_total_relation_size,  pg_table_size and pg_indexes_size seem to ignore adding or removing columns from a table.

I don't think this is a bug, it's more like a poor test case designed
with no understanding of Postgres internals.

1. Dropping a column doesn't ever produce an instantaneous reduction
of table size.  The column is still there physically.  We'll replace
it with NULLs over time as existing rows are updated.

2. Adding a column may not cause an instantaneous increase in size
either, if we can avoid an immediate table rewrite; which is true
if the column has a null or constant default value.

3. Even for your example with adding a generated column, you would
only see an increase in table size if we could now fit fewer rows per
page than before.  Your test case seems to involve comparing a table
with 3 integer-size columns to one with 3 integers and 2 smallints.
On 64-bit machines those come out to the same size (16 data bytes per
row) because the row length is padded to a multiple of 8 bytes for
alignment reasons.

                        regards, tom lane

Re: System administration functions about relation size ignore changes in the table structure

От
"David G. Johnston"
Дата:
Our convention here is to in-line or bottom post.

On Sunday, October 15, 2023, Erki Eessaar <erki.eessaar@taltech.ee> wrote:
After that I dropped ALL the columns of the table (by the way it is a nice feature that PostgreSQL allows tables without columns). VACUUM ANALYZE did not change the relation size. VACUUM FULL reduced the size, but it was still pretty large. 

Rows have overhead so even a zero column row is going to take up space.

David J. 

Re: System administration functions about relation size ignore changes in the table structure

От
Tom Lane
Дата:
Erki Eessaar <erki.eessaar@taltech.ee> writes:
> After that I dropped ALL the columns of the table (by the way it is a nice feature that PostgreSQL allows tables
withoutcolumns). VACUUM ANALYZE did not change the relation size. VACUUM FULL reduced the size, but it was still pretty
large.

You still have 15000 live rows.  None of their columns are accessible,
but the data is still there physically.  Plain VACUUM is not charged
with rewriting row contents, only with deleting dead rows, and there
are none to delete.  IIRC, VACUUM FULL will rewrite the rows to have
just null entries, making them smaller (but not zero size).

> After that I tried to add a mandatory (NOT NULL) column to the table (with a new name). It failed!!

That would work only if you provided a non-null default to populate
the column with.

> After that I tried to add an optional (permit NULLs) column to the table. It succeeded but the table now had 15_000
rowswith NULLs. 

As expected.  Reducing a row to zero columns does not eliminate the row.

> Is it possible somehow to reorganize pages so that the pages that were once used by the relation would be released?

VACUUM FULL/CLUSTER will do that, as will variants of ALTER TABLE
that force a table rewrite.  (As you've found, we've expended a
fair amount of effort on avoiding that when possible.)

> The border case with evolving the table structure produces a result that from the point of view of database user is a
bug.

You seem far too willing to use that word.  We have generally
optimized these sorts of operations to run quickly, not to
reclaim disk space as soon as possible.  In the project's
opinion, every one of these behaviors is a feature not a bug
--- and in many cases, a feature we put considerable sweat into.

            regards, tom lane