Обсуждение: fetching bytea (blob) data of 850 MB from psql client failed
Hi all,
I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below.
create table xyz (
id citext not null primary key,
col1 bytea
);
The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840 MB of data while citext column data has 10 to to 15 bytes approximately.
When I try to fetch all rows with a select query or try to fetch a single row with a select statement using the where clause for this table, the Database throws the below error -
VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959;
Routine palloc; )
Routine palloc; )
The error above pointing the PostgreSQL database backend code.
May I know why I am getting this error? If I use array fetch using the ODBC driver with fetchsize=1 (Assuming the application will fetch 1 record at a time from the PostgreSQL database server) then also I get that error. Is there any Server configuration which can control this memory allocation error and allow me to fetch one record at time from the PostgreSQL database table?
Please suggest if it is a known issue or limitation in postgresql backend code. If so please point to the documentation link for the same. If there is any workaround as well please update me.
Regards,
Jitesh
On Mon, Aug 22, 2022 at 4:35 PM jitesh tiwari <jitesh120@gmail.com> wrote: > > Hi all, > > I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below. > create table xyz ( > id citext not null primary key, > col1 bytea > ); > > The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840 MBof data while citext column data has 10 to to 15 bytes approximately. > When I try to fetch all rows with a select query or try to fetch a single row with a select statement using the whereclause for this table, the Database throws the below error - > > VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959; > Routine palloc; ) The request size is about 2x bigger than 880MB, which is what we would expect when outputting using the default setting for "bytea_output", which is "hex". Hex uses two bytes of text to represent each byte: https://www.postgresql.org/docs/devel/datatype-binary.html Unfortunately this retrieval problem has been encountered before, but is not documented that I can see: https://www.postgresql.org/message-id/flat/13120.1518793109%40sss.pgh.pa.us#81d92e9e7e15975e606bdb349ec573d4 If the data is largely printable ASCII, then one thing to try is setting "bytea_output" to "escape", if your driver and client can handle that: https://www.postgresql.org/docs/devel/runtime-config-client.html#GUC-BYTEA-OUTPUT If the data is binary and not printable ASCII, then there is no easy workaround. For this, possibly the "large object" facility could be useful, or storing the data in a regular file with the path stored in the database. -- John Naylor EDB: http://www.enterprisedb.com
Hi John,
Thanks for those details.
The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changing the bytea_output to 'escape' it failed again with the below error -
VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?
Regards,
Jitesh
On Mon, Aug 22, 2022 at 4:22 PM John Naylor <john.naylor@enterprisedb.com> wrote:
On Mon, Aug 22, 2022 at 4:35 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>
> Hi all,
>
> I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below.
> create table xyz (
> id citext not null primary key,
> col1 bytea
> );
>
> The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840 MB of data while citext column data has 10 to to 15 bytes approximately.
> When I try to fetch all rows with a select query or try to fetch a single row with a select statement using the where clause for this table, the Database throws the below error -
>
> VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959;
> Routine palloc; )
The request size is about 2x bigger than 880MB, which is what we would
expect when outputting using the default setting for "bytea_output",
which is "hex". Hex uses two bytes of text to represent each byte:
https://www.postgresql.org/docs/devel/datatype-binary.html
Unfortunately this retrieval problem has been encountered before, but
is not documented that I can see:
https://www.postgresql.org/message-id/flat/13120.1518793109%40sss.pgh.pa.us#81d92e9e7e15975e606bdb349ec573d4
If the data is largely printable ASCII, then one thing to try is
setting "bytea_output" to "escape", if your driver and client can
handle that:
https://www.postgresql.org/docs/devel/runtime-config-client.html#GUC-BYTEA-OUTPUT
If the data is binary and not printable ASCII, then there is no easy
workaround. For this, possibly the "large object" facility could be
useful, or storing the data in a regular file with the path stored in
the database.
--
John Naylor
EDB: http://www.enterprisedb.com
On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote: > > Hi John, > Thanks for those details. > The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changingthe bytea_output to 'escape' it failed again with the below error - > VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ). > BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future? Unfortunately, there is no plan as there is no consensus on how to approach the issue. It seems the possible actions are (easiest to hardest): 1. Document that bytea is unsafe -- you can insert anything you like, but you may be unable to retrieve it again. 2. Determine some maximum size that guarantees accessibility and invent a GUC that by default prevents inserting larger values than that. 3. Fix the issue properly. #2 and #3 are challenging for reasons given in the thread I linked to above. -- John Naylor EDB: http://www.enterprisedb.com
On 8/31/22 09:27, John Naylor wrote: > On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote: >> >> Hi John, >> Thanks for those details. >> The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changingthe bytea_output to 'escape' it failed again with the below error - >> VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ). >> BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future? > > Unfortunately, there is no plan as there is no consensus on how to > approach the issue. It seems the possible actions are (easiest to > hardest): > > 1. Document that bytea is unsafe -- you can insert anything you like, > but you may be unable to retrieve it again. > 2. Determine some maximum size that guarantees accessibility and > invent a GUC that by default prevents inserting larger values than > that. > 3. Fix the issue properly. > > #2 and #3 are challenging for reasons given in the thread I linked to above. > I haven't tried, but wouldn't it be enough to fetch the data in smaller chunks? The application would have to re-assemble that, of course. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
st 31. 8. 2022 v 15:20 odesílatel Tomas Vondra <tomas.vondra@enterprisedb.com> napsal:
On 8/31/22 09:27, John Naylor wrote:
> On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>>
>> Hi John,
>> Thanks for those details.
>> The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changing the bytea_output to 'escape' it failed again with the below error -
>> VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
>> BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?
>
> Unfortunately, there is no plan as there is no consensus on how to
> approach the issue. It seems the possible actions are (easiest to
> hardest):
>
> 1. Document that bytea is unsafe -- you can insert anything you like,
> but you may be unable to retrieve it again.
> 2. Determine some maximum size that guarantees accessibility and
> invent a GUC that by default prevents inserting larger values than
> that.
> 3. Fix the issue properly.
>
> #2 and #3 are challenging for reasons given in the thread I linked to above.
>
I haven't tried, but wouldn't it be enough to fetch the data in smaller
chunks? The application would have to re-assemble that, of course.
yes - there are possible both direction conversions to LO. So anybody can convert any bytea to temp LO, and then it can download to any client
Regards
Pavel
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 22, 2022 at 05:52:29PM +0700, John Naylor wrote: > If the data is binary and not printable ASCII, then there is no easy > workaround. I would say the chief solution is to request binary transmission in your application. This cuts your network traffic in half for retrieving a bytea, so it's a good change even if the server improves someday. If you're using libpq, search for show_binary_results in https://www.postgresql.org/docs/devel/libpq-example.html to see an example of doing this. I don't know of an equivalent option in ODBC; if there is none, one could modify the ODBC driver to use the same protocol feature that libpq uses.