pg_dump / copy bugs with "big lines" ?
От | Ronan Dunklau |
---|---|
Тема | pg_dump / copy bugs with "big lines" ? |
Дата | |
Msg-id | 1836813.YmyOrS99PX@ronan.dunklau.fr обсуждение исходный текст |
Ответы |
Re: pg_dump / copy bugs with "big lines" ?
(Jim Nasby <Jim.Nasby@BlueTreble.com>)
|
Список | pgsql-hackers |
Hello hackers, I've tried my luck on pgsql-bugs before, with no success, so I report these problem here. The documentation mentions the following limits for sizes: Maximum Field Size 1 GB Maximum Row Size 1.6 TB However, it seems like rows bigger than 1GB can't be COPYed out: ro=# create table test_text (c1 text, c2 text); CREATE TABLE ro=# insert into test_text (c1) VALUES (repeat('a', 536870912)); INSERT 0 1 ro=# update test_text set c2 = c1; UPDATE 1 Then, trying to dump or copy that results in the following error: ro=# COPY test_text TO '/tmp/test'; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912 more bytes. In fact, the same thing happens when using a simple SELECT: ro=# select * from test_text ; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912 more bytes. In the case of COPY, the server uses a StringInfo to output the row. The problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row should be able to hold much more than that. So, is this a bug ? Or is there a caveat I would have missed in the documentation ? We also hit a second issue, this time related to bytea encoding. This test case is a bit more complicated, since I had to use an external (client) program to insert my data. It involves inserting a string that fit into 1GB when encoded in escape format, but is larger than that in hex, and another string which fits in 1GB using the hex format, but is larger than that in escape: from psycopg2 import connect from io import BytesIO conn = connect(dbname="ro") cur = conn.cursor() fullcontent = BytesIO() # Write a binary string that weight less # than 1 GB when escape encoded, but more than # that if hex encoded for i in range(200): fullcontent.write(b"aaa" * 1000000) fullcontent.seek(0) cur.copy_from(fullcontent, "test_bytea") fullcontent.seek(0) fullcontent.truncate() # Write another binary string that weight # less than 1GB when hex encoded, but more than # that if escape encoded cur.execute("SET bytea_output = 'hex'") fullcontent.write(b"\\\\x") for i in range(300): fullcontent.write(b"00" * 1000000) fullcontent.seek(0) cur.copy_from(fullcontent, "test_bytea") cur.execute("COMMIT;") cur.close() I couldn't find an invocation of pg_dump which would allow me to dump both lines: ro@ronan_laptop /tmp % PGOPTIONS="-c bytea_output=escape" pg_dump -Fc > /dev/null pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 1200000001 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; ro@ronan_laptop /tmp % PGOPTIONS="-c bytea_output=hex" pg_dump -Fc > /dev/null pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 1200000003 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; Using a COPY with binary format works: ro=# COPY test_bytea TO '/tmp/test' WITH BINARY; There seems to be a third issue, with regards to escape encoding: the backslash character is escaped, by adding another backslash. This means that a field which size is less than 1GB using the escape sequence will not be able to be output once the backslash are escaped. For example, lets consider a string consisting of 300000000 '\' characters: ro=# select length(c1) from test_bytea; length -----------300000000 (1 ligne) ro=# select length(encode(c1, 'escape')) from test_bytea ; length -----------600000000 (1 ligne) ro=# set bytea_output to escape; SET ro=# copy test_bytea to '/tmp/test.csv' ; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 1073741822 bytes by 1 more bytes. I think pg_dump should not error out on any data which was valid upon insertion. It seems the fix would be non-trivial, since StringInfo structures are relying on a limit of MaxAllocSize. Or am I missing something ? Thank you. -- Ronan Dunklau http://dalibo.com - http://dalibo.org
В списке pgsql-hackers по дате отправления: