Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id Zbr6piWuVHDtFFOl@paquier.xyz
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Junwang Zhao <zhjwpku@gmail.com>)
Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> Thanks, I'm looking into that now.

I have much to say about the patch, but for now I have begun running
some performance tests using the patches, because this thread won't
get far until we are sure that the callbacks do not impact performance
in some kind of worst-case scenario.  First, here is what I used to
setup a set of tables used for COPY FROM and COPY TO (requires [1] to
feed COPY FROM's data to the void, and note that default values is to
have a strict control on the size of the StringInfos used in the copy
paths):
CREATE EXTENSION blackhole_am;
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
    query := query || 'a_' || i::text || ' int default 1';
    IF i != num_cols THEN
      query := query || ', ';
    END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
-- Tables used for COPY TO
SELECT create_table_cols ('to_tab_1', 1);
SELECT create_table_cols ('to_tab_10', 10);
INSERT INTO to_tab_1 SELECT FROM generate_series(1, 10000000);
INSERT INTO to_tab_10 SELECT FROM generate_series(1, 10000000);
-- Data for COPY FROM
COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
-- Tables used for COPY FROM
SELECT create_table_cols ('from_tab_1', 1);
SELECT create_table_cols ('from_tab_10', 10);
ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;

Then I have run a set of tests using HEAD, v7 and v10 with queries
like that (adapt them depending on the format and table):
COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
SET client_min_messages TO error; -- for blackhole_am
COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5

All the patches have been compiled with -O2, without assertions, etc.
Postgres is run in tmpfs mode, on scissors, without fsync.  Unlogged
tables help a bit in focusing on the execution paths as we don't care
about WAL, of course.  I have also included v7 in the test of tests,
as this version uses more simple per-row callbacks.

And here are the results I get for text and binary (ms, average of 15
queries after discarding the three highest and three lowest values):
      test       | master |  v7  | v10
-----------------+--------+------+------
 from_bin_1col   | 1575   | 1546 | 1575
 from_bin_10col  | 5364   | 5208 | 5230
 from_text_1col  | 1690   | 1715 | 1684
 from_text_10col | 4875   | 4793 | 4757
 to_bin_1col     | 1717   | 1730 | 1731
 to_bin_10col    | 7728   | 7707 | 7513
 to_text_1col    | 1710   | 1730 | 1698
 to_text_10col   | 5998   | 5960 | 5987

I am getting an interesting trend here in terms of a speedup between
HEAD and the patches with a table that has 10 attributes filled with
integers, especially for binary and text with COPY FROM.  COPY TO
binary also gets nice numbers, while text looks rather stable.  Hmm.

These were on my buildfarm animal, but we need to be more confident
about all this.  Could more people run these tests?  I am going to do
a second session on a local machine I have at hand and see what
happens.  Will publish the numbers here, the method will be the same.

[1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: More new SQL/JSON item methods
Следующее
От: James Coleman
Дата:
Сообщение: set_cheapest without checking pathlist