Обсуждение: Error in calculating length of encoded base64 string
Greetings, everyone! While working on an extension I've found an error in how length of encoded base64 string is calulated; This error is present in 3 files across all supported versions: /src/common/base64.c, function pg_b64_enc_len; /src/backend/utils/adt/encode.c, function pg_base64_enc_len; /contrib/pgcrypto/pgp-armor.c, function pg_base64_enc_len (copied from encode.c). In all three cases the length is calculated as follows: (srclen + 2) * 4 / 3; (plus linefeed in latter two cases) There's also a comment /* 3 bytes will be converted to 4 */ This formula is wrong. Let's calculate encoded length for different starting lengths: starting length 2: (2 + 2) * 4 / 3 = 5, starting length 3: (3 + 2) * 4 / 3 = 6, starting length 4: (4 + 2) * 4 / 3 = 8, starting length 6: (6 + 2) * 4 / 3 = 10, starting length 10: (10 + 2) * 4 / 3 = 16, when it should be 4, 4, 8, 8, 16. So the suggestion is to change the formula to a right one: (srclen + 2) / 3 * 4; The patch is attached. Oleg Tselebrovskiy, Postgres Pro
Вложения
o.tselebrovskiy@postgrespro.ru writes: > While working on an extension I've found an error in how length of > encoded base64 string is calulated; Yeah, I think you're right. It's not of huge significance, because it just overestimates by 1 or 2 bytes, but we might as well get it right. Thanks for the report and patch! regards, tom lane
On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > o.tselebrovskiy@postgrespro.ru writes: > > While working on an extension I've found an error in how length of > > encoded base64 string is calulated; > > Yeah, I think you're right. It's not of huge significance, because > it just overestimates by 1 or 2 bytes, but we might as well get > it right. Thanks for the report and patch! From your commit d98ed080bb > This bug is very ancient, dating to commit 79d78bb26 which > added encode.c. (The other instances were presumably copied > from there.) Still, it doesn't quite seem worth back-patching. Is it worth investing time in trying to unify these 3 occurrences of base64 length (and possibly other relevant) code to one place? If yes, I can volunteer for it. The common code facility under src/common/ did not exist back when pgcrypto was added, but since it does now, it may be worth it make others depend on implementation in src/common/ code. Best regards, Gurjeet http://Gurje.et
Gurjeet Singh <gurjeet@singh.im> writes: > On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This bug is very ancient, dating to commit 79d78bb26 which >> added encode.c. (The other instances were presumably copied >> from there.) Still, it doesn't quite seem worth back-patching. > Is it worth investing time in trying to unify these 3 occurrences of > base64 length (and possibly other relevant) code to one place? If yes, > I can volunteer for it. I wondered about that too. It seems really silly that we made a copy in src/common and did not replace the others with calls to that. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Gurjeet Singh <gurjeet@singh.im> writes: >> On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> This bug is very ancient, dating to commit 79d78bb26 which >>> added encode.c. (The other instances were presumably copied >>> from there.) Still, it doesn't quite seem worth back-patching. > >> Is it worth investing time in trying to unify these 3 occurrences of >> base64 length (and possibly other relevant) code to one place? If yes, >> I can volunteer for it. > > I wondered about that too. It seems really silly that we made > a copy in src/common and did not replace the others with calls > to that. Also, while we're at it, how about some unit tests that both encode and calculate the encoded length of strings of various lengths and check that they match? - ilmari
On 2023-Jun-09, Tom Lane wrote: > Gurjeet Singh <gurjeet@singh.im> writes: > > On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> This bug is very ancient, dating to commit 79d78bb26 which > >> added encode.c. (The other instances were presumably copied > >> from there.) Still, it doesn't quite seem worth back-patching. > > > Is it worth investing time in trying to unify these 3 occurrences of > > base64 length (and possibly other relevant) code to one place? If yes, > > I can volunteer for it. > > I wondered about that too. It seems really silly that we made > a copy in src/common and did not replace the others with calls > to that. I looked into this. It turns out that there is a difference in newline handling in the other routines compared to what was added for SCRAM, which doesn't have any (and complains if you supply them). Peter E did suggest to unify them at the time: https://www.postgresql.org/message-id/947b9aff-8fdb-dbf5-a99c-0ffd4523a73f%402ndquadrant.com We could add a boolean "whitespace" flag to both of src/common/base64.c's pg_b64_encode() and pg_b64_decode(); with that I think it could serve the three places that need it. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/