Обсуждение: BUG #17277: write past chunk when calling normalize() on an empty string
BUG #17277: write past chunk when calling normalize() on an empty string
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17277 Logged by: Matthijs van der Vleuten Email address: postgresql@zr40.nl PostgreSQL version: 14.0 Operating system: Debian sid Description: When calling normalize(''), that is, on an empty string, a warning is raised: "problem in alloc set ExprContext: detected write past chunk end". I believe this is due to an error in unicode_norm.c. In unicode_normalize(), when recompose is true (that is, when using NFC or NFKC normalization) the loop on line 498 will iterate once before checking count < decomp_size. When the input is an empty string, this would cause a write outside of the memory allocated for recomp_chars. Reproduction: zr40@[local]:5432 ~=# select version(); version ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── PostgreSQL 14.0 (Debian 14.0-1.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.3.0-11) 10.3.0, 64-bit (1 row) zr40@[local]:5432 ~=# select normalize(''); WARNING: problem in alloc set ExprContext: detected write past chunk end in block 0x55793d119620, chunk 0x55793d1196a8 WARNING: problem in alloc set ExprContext: detected write past chunk end in block 0x55793d119620, chunk 0x55793d1196a8 normalize ─────────── (1 row)
Re: BUG #17277: write past chunk when calling normalize() on an empty string
От
Michael Paquier
Дата:
On Tue, Nov 09, 2021 at 09:55:08PM +0000, PG Bug reporting form wrote: > When calling normalize(''), that is, on an empty string, a warning is > raised: "problem in alloc set ExprContext: detected write past chunk end". Well, direct callers of unicode_normalize_kc() in ~12 would have the same problem because this code was not written with this case in mind as far as I recall, after looking at the git history (60f11b8) as pg_saslprep() does not allow the case of empty passwords. > I believe this is due to an error in unicode_norm.c. In unicode_normalize(), > when recompose is true (that is, when using NFC or NFKC normalization) the > loop on line 498 will iterate once before checking count < decomp_size. When > the input is an empty string, this would cause a write outside of the memory > allocated for recomp_chars. No, the code does not take the recomposition loop in this case, but the initialization of target_pos to 1 would cause recomp_chars to be written past its allocation position by one byte. As there could be callers of unicode_normalize[_kc]() outside core, I'd rather fix that at the source and patch unicode_norm.c. One way to do that would be to leave once you know that there is nothing to decompose after the loop over decompose_code() and return decomp_chars that would be set with an empty set of points, as per the attached. There may be a point in issuing an error if there is an empty string, though. Another thing would be to consider if is_normalized() should return false for an empty string, but we have considered empty strings as normalized since this has been released: =# SELECT '' IS NFD NORMALIZED; is_normalized --------------- t (1 row) That feels more natural this way. Still, I can see some perl modules that would return false for such a case, by the way. The normalization docs don't seem to mention that directly, except for the stream-safe text format: https://www.unicode.org/faq/normalization.html https://unicode.org/reports/tr15/tr15-51.html -- Michael
Вложения
Re: BUG #17277: write past chunk when calling normalize() on an empty string
От
Michael Paquier
Дата:
On Wed, Nov 10, 2021 at 03:33:29PM +0900, Michael Paquier wrote: > That feels more natural this way. Still, I can see some perl modules > that would return false for such a case, by the way. The > normalization docs don't seem to mention that directly, except for the > stream-safe text format: > https://www.unicode.org/faq/normalization.html > https://unicode.org/reports/tr15/tr15-51.html I have expanded the tests, and fixed this one as of 098c1345. Thanks for the report, Matthijs! -- Michael