Обсуждение: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

Поиск
Список
Период
Сортировка

Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
Sai Teja
Дата:
Hi All,

We have one table which is storing XML data with 30k records and stores huge amount of data.

We are trying to create the index for this column in the table. But, we’re getting “Huge input Lookup error” during creation of Index.

Please check the below command which is used to create the index

CREATE INDEX xml_index on xml_table using BTREE (CAST (XPATH (‘directory/access/mode/@Identifier’, content) as text[]))

Here we’re using Xpath expression to create the index since postgreSQL directly does not support comparison methods. So, we decided to use Xpath expression. But while creating the index as I mentioned above we’re facing the issue with Huge Input lookup

I can able to create the index when the table have no records. But now we have huge amount of data stored in the table. So, creating the index facing the issue with Huge input lookup

The same error we have faced earlier when trying to retrieve the particular rows from the table So we have changed the XML option from content to document and then it got worked and we can able to retrieve the files now.

But, now while creating the index, we tried to change the XML Option from content to document again. But this didn’t worked.

PS, we are using postgreSQL 14.8 version which is hosted in azure.

I would request to please suggest any ideas to resolve this issue. This would help us a lot and appreciated
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
> We have one table which is storing XML data with 30k records and stores
> huge amount of data.
> We are trying to create the index for this column in the table. But, we’re
> getting “Huge input Lookup error” during creation of Index.

There is no such string anywhere in the Postgres source code;
furthermore, if someone tried to add such an error, it'd get rejected
(I hope) as not conforming to our style guidelines.  I thought maybe
it's coming from libxml or the xpath code, but I couldn't get a match
for it anywhere in Debian Code Search either.  Is that the *exact*
spelling of the message?  Are you sure it's not coming from your
client-side code?

            regards, tom lane



Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
Erik Wienhold
Дата:
On 07/09/2023 21:09 CEST Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Sai Teja <saitejasaichintalapudi@gmail.com> writes:
> > We have one table which is storing XML data with 30k records and stores
> > huge amount of data.
> > We are trying to create the index for this column in the table. But, we’re
> > getting “Huge input Lookup error” during creation of Index.
>
> There is no such string anywhere in the Postgres source code;
> furthermore, if someone tried to add such an error, it'd get rejected
> (I hope) as not conforming to our style guidelines.  I thought maybe
> it's coming from libxml or the xpath code, but I couldn't get a match
> for it anywhere in Debian Code Search either.  Is that the *exact*
> spelling of the message?  Are you sure it's not coming from your
> client-side code?

Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
error is from libxml.

[1] https://www.postgresql.org/message-id/CADBXDMV_D3pBaSA5ZfUWnQP42ZO1YsbP9GZv8t1FncdGTiLMug%40mail.gmail.com

--
Erik



Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
"David G. Johnston"
Дата:
On Thu, Sep 7, 2023 at 12:28 PM Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
Here we’re using Xpath expression to create the index since postgreSQL directly does not support comparison methods. So, we decided to use Xpath expression. But while creating the index as I mentioned above we’re facing the issue with Huge Input lookup

It doesn't support comparison methods of stuff like this because doing so tends to be pointless or very expensive, setting aside the fact that, as you probably are seeing here, records in an index must be small enough to fit on a physical page and large bodies of text typically don't.

If you truly want to perform equality checks on large bodies of text the typical solution is to hash said text and then perform a comparison against hashes.

Since you are producing an array you might be able to get something like a GIN or GIST index to work...I'm not all that familiar with them but they were designed for non-atomic data values.

David J.

Erik Wienhold <ewie@ewie.name> writes:
> On 07/09/2023 21:09 CEST Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There is no such string anywhere in the Postgres source code;
>> furthermore, if someone tried to add such an error, it'd get rejected
>> (I hope) as not conforming to our style guidelines.  I thought maybe
>> it's coming from libxml or the xpath code, but I couldn't get a match
>> for it anywhere in Debian Code Search either.  Is that the *exact*
>> spelling of the message?

> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,
we could maybe relax this restriction by passing the XML_PARSE_HUGE
option to xmlCtxtReadDoc().  However, there are things to worry about:

* Some of the other libxml functions we use don't seem to have an
options argument, so it's not clear how to remove the limit in all
code paths.

* One of the first hits I got while googling for XML_PARSE_HUGE was
CVE-2022-40303 [1] (libxml2: integer overflows with XML_PARSE_HUGE).
It seems highly likely that not everybody's libxml is patched for
that yet, meaning we'd be opening a lot of systems to security issues.

* XML_PARSE_HUGE apparently also removes restrictions on nesting
depth of XML documents.  I wonder whether that creates a risk of
stack-overflow crashes.

On the whole, I'm not sure I want to mess with this.  libxml2 is
rickety enough already without taking off its training wheels.
And, as noted by David J., we'd very possibly only be moving
the bottleneck somewhere else.  "Put many megabytes of data into
one field" is an antipattern for successful SQL use, and probably
always will be.

            regards, tom lane

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2136266



Thank you so much for all your responses.

I just tried with Hash, GIN etc

But it didn't worked. And I think it is because of "Xpath" expression which I used in the index create command.

But is there any alternative way to change this Xpath? Since I need to parse the XML as there is no other option. I need the other ways to create the index .

May be if there are any parameters to change like xmloption etc it would help us to resolve the issue.

Thanks,
Sai



On Fri, 8 Sep, 2023, 1:51 am Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Erik Wienhold <ewie@ewie.name> writes:
> On 07/09/2023 21:09 CEST Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There is no such string anywhere in the Postgres source code;
>> furthermore, if someone tried to add such an error, it'd get rejected
>> (I hope) as not conforming to our style guidelines.  I thought maybe
>> it's coming from libxml or the xpath code, but I couldn't get a match
>> for it anywhere in Debian Code Search either.  Is that the *exact*
>> spelling of the message?

> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,
we could maybe relax this restriction by passing the XML_PARSE_HUGE
option to xmlCtxtReadDoc().  However, there are things to worry about:

* Some of the other libxml functions we use don't seem to have an
options argument, so it's not clear how to remove the limit in all
code paths.

* One of the first hits I got while googling for XML_PARSE_HUGE was
CVE-2022-40303 [1] (libxml2: integer overflows with XML_PARSE_HUGE).
It seems highly likely that not everybody's libxml is patched for
that yet, meaning we'd be opening a lot of systems to security issues.

* XML_PARSE_HUGE apparently also removes restrictions on nesting
depth of XML documents.  I wonder whether that creates a risk of
stack-overflow crashes.

On the whole, I'm not sure I want to mess with this.  libxml2 is
rickety enough already without taking off its training wheels.
And, as noted by David J., we'd very possibly only be moving
the bottleneck somewhere else.  "Put many megabytes of data into
one field" is an antipattern for successful SQL use, and probably
always will be.

                        regards, tom lane

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2136266

Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
Dominique Devienne
Дата:
On Thu, Sep 7, 2023 at 10:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Erik Wienhold <ewie@ewie.name> writes:
> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,
we could maybe relax this restriction by passing the XML_PARSE_HUGE
option to xmlCtxtReadDoc().  However, there are things to worry about:

Just a remark from the sidelines, from someone having done a fair bit of XML in years past.

That XPath is simple, and a streaming parser (SAX or StAX) could handle it. While that
XML_PARSE_HUGE option probably applies to a DOM parser. So is there a work-around
to somehow force using a streaming parser instead of one that must produce the whole Document,
just so a few elements are picked out of it? FWIW. --DD

Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
Dominique Devienne
Дата:
On Fri, Sep 8, 2023 at 11:39 AM Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, Sep 7, 2023 at 10:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Erik Wienhold <ewie@ewie.name> writes:
> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,
we could maybe relax this restriction by passing the XML_PARSE_HUGE
option to xmlCtxtReadDoc().  However, there are things to worry about:

Just a remark from the sidelines, from someone having done a fair bit of XML in years past.

That XPath is simple, and a streaming parser (SAX or StAX) could handle it. While that
XML_PARSE_HUGE option probably applies to a DOM parser. So is there a work-around
to somehow force using a streaming parser instead of one that must produce the whole Document,
just so a few elements are picked out of it? FWIW. --DD

If push comes to shove, the streaming-based extraction can be done outside the DB, stored in a new column
or table, and index that instead. This is in fact exactly the approach I took on one server handling XML I wrote.

To be honest, in my case, the XMLs were never large, so I used rapidxml which is also a DOM parser,
but the same principle applies though, i.e. extract the data from the XML outside the DB using
SAX (push) / StAX (pull), to avoid having a (too) large document in memory at any time (client or server side). --DD

Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

От
Rob Sargent
Дата:
On 9/7/23 23:51, Sai Teja wrote:
> Thank you so much for all your responses.
>
> I just tried with Hash, GIN etc
>
> But it didn't worked. And I think it is because of "Xpath" expression 
> which I used in the index create command.
>
> But is there any alternative way to change this Xpath? Since I need to 
> parse the XML as there is no other option. I need the other ways to 
> create the index .
>
> May be if there are any parameters to change like xmloption etc it 
> would help us to resolve the issue.
>
> Thanks,
> Sai
>
>

What is a typical value returned by your xpath statement? (XPATH 
(‘directory/access/mode/@Identifier’, content))