Re: Cleaning up array_in()

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Cleaning up array_in()
Дата
Msg-id 5859ce4e-2be4-92b0-c85c-e1e24eab57c6@iki.fi
обсуждение исходный текст
Ответ на Re: Cleaning up array_in()  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Cleaning up array_in()  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Cleaning up array_in()  (jian he <jian.universality@gmail.com>)
Список pgsql-hackers
On 08/07/2023 19:08, Tom Lane wrote:
> I wrote:
>> So I end up with the attached.  I went ahead and dropped
>> ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
>> where the new 0002 avoids re-indenting any existing code in order
>> to ease review, and then 0003 is just a mechanical application
>> of pgindent.
> 
> That got sideswiped by ae6d06f09, so here's a trivial rebase to
> pacify the cfbot.
> 
> #text/x-diff; name="v3-0001-Simplify-and-speed-up-ReadArrayStr.patch"
[v3-0001-Simplify-and-speed-up-ReadArrayStr.patch]/home/tgl/pgsql/v3-0001-Simplify-and-speed-up-ReadArrayStr.patch
 
> #text/x-diff; name="v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch"
[v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch]
/home/tgl/pgsql/v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch
> #text/x-diff; name="v3-0003-Re-indent-ArrayCount.patch" [v3-0003-Re-indent-ArrayCount.patch]
/home/tgl/pgsql/v3-0003-Re-indent-ArrayCount.patch

Something's wrong with your attachments.

Hmm, I wonder if ae6d06f09 had a negative performance impact. In an 
unquoted array element, scanner_isspace() function is called for every 
character, so it might be worth inlining.

On the patches: They are a clear improvement, thanks for that. That 
said, I still find the logic very hard to follow, and there are some 
obvious performance optimizations that could be made.

ArrayCount() interprets low-level quoting and escaping, and tracks the 
dimensions at the same time. The state machine is pretty complicated. 
And when you've finally finished reading and grokking that function, you 
see that ReadArrayStr() repeats most of the same logic. Ugh.

I spent some time today refactoring it for readability and speed. I 
introduced a separate helper function to tokenize the input. It deals 
with whitespace, escapes, and backslashes. Then I merged ArrayCount() 
and ReadArrayStr() into one function that parses the elements and 
determines the dimensions in one pass. That speeds up parsing large 
arrays. With the tokenizer function, the logic in ReadArrayStr() is 
still quite readable, even though it's now checking the dimensions at 
the same time.

I also noticed that we used atoi() to parse the integers in the 
dimensions, which doesn't do much error checking. Some funny cases were 
accepted because of that, for example:

postgres=# select '[1+-+-+-+-+-+:2]={foo,bar}'::text[];
    text
-----------
  {foo,bar}
(1 row)

I tightened that up in the passing.

Attached are your patches, rebased to fix the conflicts with ae6d06f09 
like you intended. On top of that, my patches. My patches need more 
testing, benchmarking, and review, so if we want to sneak something into 
v16, better go with just your patches. If we're tightening up the 
accepted inputs, maybe fix that atoi() sloppiness, though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Cleaning up array_in()
Следующее
От: Gurjeet Singh
Дата:
Сообщение: Re: DecodeInterval fixes