Обсуждение: Broken links in mailinglist archive due to percent-encoding

Поиск
Список
Период
Сортировка

Broken links in mailinglist archive due to percent-encoding

От
Erik Wienhold
Дата:
I've just send a mail [1] to pgsql-general and the mailinglist archive shows
a broken link [2].  I included the correct link [3] in my message and also
received my message with the correct link from the list.

It looks like the archive percent-encodes subcomponent delimiters in the query
component.  Perhaps the encoding is allowed and it's just git.postgresql.org
that can't handle it.  But I'm pretty sure that links to git.postgresql.org
from the archive worked in the past.

[1] https://www.postgresql.org/message-id/1550267563.330669.1693335893138%40office.mailbox.org
[2]
https://git.postgresql.org/gitweb/?p=postgresql.git%3Ba%3Dblob%3Bf%3Dsrc%2Fbin%2Fpsql%2Fdescribe.c%3Bh%3Dbac94a338cfbc497200f0cf960cbabce2dadaa33%3Bhb%3D9b581c53418666205938311ef86047aa3c6b741f#l1149
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/psql/describe.c;h=bac94a338cfbc497200f0cf960cbabce2dadaa33;hb=9b581c53418666205938311ef86047aa3c6b741f#l1420

--
Erik



Re: Broken links in mailinglist archive due to percent-encoding

От
Erik Wienhold
Дата:
On 29/08/2023 21:38 CEST Erik Wienhold <ewie@ewie.name> wrote:

> It looks like the archive percent-encodes subcomponent delimiters in the query
> component.  Perhaps the encoding is allowed and it's just git.postgresql.org
> that can't handle it.  But I'm pretty sure that links to git.postgresql.org
> from the archive worked in the past.

I've been digging around a bit more because this is an odd bug.

Turns out it's the result of applying Django's urlize filter to the message
body [1]:

    >>> from django.template.defaultfilters import urlize
    >>> urlize('http://example.net/foo?bar=baz;abc=123')
    '<a href="http://example.net/foo?bar=baz%3Babc%3D123" rel="nofollow">http://example.net/foo?bar=baz;abc=123</a>'

Looks like a bug in Django because it does not percent-encode any sub-delimiters
outside the query component:

    >>> urlize('http://example.net/foo;bar=baz')
    '<a href="http://example.net/foo;bar=baz" rel="nofollow">http://example.net/foo;bar=baz</a>'

And regarding git.postgresql.org: gitweb generates URLs with semicolon as the
separator of query pairs [2] instead of using ampersand, although semicolon is
no longer recommended by W3C.  But gitweb also handles query components with
ampersand instead of semicolon.  Which means that links [1] and [3] work after
I've manually replaced all semicolons with ampersands.

[1]
https://git.postgresql.org/gitweb/?p=pgarchives.git&a=blob&f=django/archives/mailarchives/templates/_message.html&h=c90a80afea418fc4800ae81bb517978fa56f7a4d&hb=HEAD#l64
[2] https://git.kernel.org/pub/scm/git/git.git/tree/gitweb/gitweb.perl#n1505
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git&a=blob&f=src/bin/psql/describe.c&h=bac94a338cfbc497200f0cf960cbabce2dadaa33&hb=9b581c53418666205938311ef86047aa3c6b741f#l1420

--
Erik