Обсуждение: [DOCS] Let's document a bytea bug
Hello! Some time ago I've encountered a problem with the bytea type: we can't SELECT bytea strings whose textual representation is too big to fit into StringInfoData. And as a side effect, pg_dump refuses to dump tables with big bytea strings. It's a bug, it's pretty confusing, but it seems like there's no pretty way to fix it so far. Here's a link to a recent discussion on the issue: https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru Since it won't be fixed anytime soon, I thought it could be worth documenting. Attaching a patch for the documentation: I added some text to the "Binary Data Types" part where I tried to describe the issue and to explain how to deal with it. My patch in plain text (for convenience): It is not recommended to use bytea strings whose textual representation exceeds 1GB, as it may not be possible to SELECT them due to output size limitations. Consequently, a table containing such big strings cannot be properly processed by pg_dump, as pg_dump will try to SELECT these values from the table and fail. The exact size limit advised for bytea strings depends on their content, the external format and encoding that you are using, the context in which they will be selected. The general rule is that when you use SELECT, the returned tuple should not exceed 1GB. Although even if SELECT does not work, you can still retrieve big bytea strings using COPY in binary format. -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Hi Anna! > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а): > > > Some time ago I've encountered a problem with the bytea type: we can't SELECT > bytea strings whose textual representation is too big to fit into StringInfoData. > And as a side effect, pg_dump refuses to dump tables with big bytea strings. > > It's a bug, it's pretty confusing, but it seems like there's no pretty way > to fix it so far. Here's a link to a recent discussion on the issue: > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru > > Since it won't be fixed anytime soon, I thought it could be worth documenting. > Attaching a patch for the documentation: I added some text to the "Binary Data Types" > part where I tried to describe the issue and to explain how to deal with it. > > My patch in plain text (for convenience): > > It is not recommended to use bytea strings whose textual representation > exceeds 1GB, as it may not be possible to SELECT them due to output size > limitations. Consequently, a table containing such big strings cannot be > properly processed by pg_dump, as pg_dump will try to SELECT these values from the > table and fail. The exact size limit advised for bytea strings depends on their > content, the external format and encoding that you are using, the context in > which they will be selected. The general rule is that when you use SELECT, > the returned tuple should not exceed 1GB. Although even if SELECT does not > work, you can still retrieve big bytea strings using COPY in binary format. Thanks for this message. It took me a while to find out what was the problem. +1 for documenting this, maybe even with exact error like [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR: invalid memory alloc request size 1472599557 It's really really scary. My first feeling was that it's TOAST corruption. Best regards, Andrey Borodin.
On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote: > Hi Anna! > > > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а): > > > > > > Some time ago I've encountered a problem with the bytea type: we can't SELECT > > bytea strings whose textual representation is too big to fit into StringInfoData. > > And as a side effect, pg_dump refuses to dump tables with big bytea strings. > > > > It's a bug, it's pretty confusing, but it seems like there's no pretty way > > to fix it so far. Here's a link to a recent discussion on the issue: > > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru > > > > Since it won't be fixed anytime soon, I thought it could be worth documenting. > > Attaching a patch for the documentation: I added some text to the "Binary Data Types" > > part where I tried to describe the issue and to explain how to deal with it. > > > > My patch in plain text (for convenience): > > > > It is not recommended to use bytea strings whose textual representation > > exceeds 1GB, as it may not be possible to SELECT them due to output size > > limitations. Consequently, a table containing such big strings cannot be > > properly processed by pg_dump, as pg_dump will try to SELECT these values from the > > table and fail. The exact size limit advised for bytea strings depends on their > > content, the external format and encoding that you are using, the context in > > which they will be selected. The general rule is that when you use SELECT, > > the returned tuple should not exceed 1GB. Although even if SELECT does not > > work, you can still retrieve big bytea strings using COPY in binary format. > > Thanks for this message. It took me a while to find out what was the problem. > +1 for documenting this, maybe even with exact error like > [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR: invalid memory alloc request size 1472599557 > It's really really scary. My first feeling was that it's TOAST corruption. I still have Anna Akenteva's 2018 email in my mailbox because I wanted to research this further. Now that you have re-reported the problem, I am on it! ;-) I looked for a clearer reproducible test case, and I have found this one: $ awk 'BEGIN { printf "\\\\x"; for (i=0; i < 30000000; i++) \ printf "7f7f7f7f7f7f7f7f7f7f"; print}' > /tmp/x $ psql test CREATE TABLE test(x BYTEA); COPY test FROM '/tmp/x'; COPY test FROM '/tmp/x'; CREATE TABLE big_data AS SELECT (string_agg(x,'')) AS x FROM test; SELECT length(x) FROM big_data; length ----------- 600000000 SELECT octet_length(x) FROM big_data; octet_length -------------- 600000000 SELECT x FROM big_data; ERROR: invalid memory alloc request size 1200000003 \q $ pg_dump -d test > /rtmp/a pg_dump: Dumping the contents of table "big_data" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 1200000003 pg_dump: The command was: COPY public.big_data (x) TO stdout; So, the character and octet length is 600 million, but on output, that will be expanded, and both SELECT and pg_dump fail. I also can't see how to improve the error message since it happens so low in the stack. In reading the previous posts, I understand that expanding the length limit would be very hard, and it is difficult to see how to know if the output will error at input time. I am in favor of the 2018 proposed patch, attached, perhaps with some small adjustments. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Вложения
On 8/3/20 4:20 PM, Bruce Momjian wrote: > On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote: >> Hi Anna! >> >> > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а): >> > >> > >> > Some time ago I've encountered a problem with the bytea type: we can't SELECT >> > bytea strings whose textual representation is too big to fit into StringInfoData. >> > And as a side effect, pg_dump refuses to dump tables with big bytea strings. >> > >> > It's a bug, it's pretty confusing, but it seems like there's no pretty way >> > to fix it so far. Here's a link to a recent discussion on the issue: >> > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru >> > >> > Since it won't be fixed anytime soon, I thought it could be worth documenting. >> > Attaching a patch for the documentation: I added some text to the "Binary Data Types" >> > part where I tried to describe the issue and to explain how to deal with it. >> > >> > My patch in plain text (for convenience): >> > >> > It is not recommended to use bytea strings whose textual representation >> > exceeds 1GB, as it may not be possible to SELECT them due to output size >> > limitations. Consequently, a table containing such big strings cannot be >> > properly processed by pg_dump, as pg_dump will try to SELECT these values from the >> > table and fail. The exact size limit advised for bytea strings depends on their >> > content, the external format and encoding that you are using, the context in >> > which they will be selected. The general rule is that when you use SELECT, >> > the returned tuple should not exceed 1GB. Although even if SELECT does not >> > work, you can still retrieve big bytea strings using COPY in binary format. >> >> Thanks for this message. It took me a while to find out what was the problem. >> +1 for documenting this, maybe even with exact error like >> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR: invalid memory alloc request size 1472599557 >> It's really really scary. My first feeling was that it's TOAST corruption. > > I still have Anna Akenteva's 2018 email in my mailbox because I wanted > to research this further. Now that you have re-reported the problem, I > am on it! ;-) > > I looked for a clearer reproducible test case, and I have found this > one: It is easier to reproduce than that: select repeat('x',600000000)::bytea; ERROR: invalid memory alloc request size 1200000003 select octet_length(repeat('x',600000000)::bytea); octet_length -------------- 600000000 (1 row) CREATE TABLE big_data AS select repeat('x',600000000)::bytea; SELECT 1 SELECT repeat FROM big_data; ERROR: invalid memory alloc request size 1200000003 \q pg_dump -d postgres > /tmp/a pg_dump: error: Dumping the contents of table "big_data" failed: PQgetResult() failed. pg_dump: error: Error message from server: ERROR: invalid memory alloc request size 1200000003 pg_dump: error: The command was: COPY public.big_data (repeat) TO stdout; There are other cases that also hit the StringInfo limits (i.e. MaxAllocSize) with TEXT data type (and I guess others, like JSON or XML, too): select repeat('x',600000000), repeat('y',600000000); ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 600000010 bytes by 600000000 more bytes. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development