Обсуждение: cannot download mbox with python

Поиск
Список
Период
Сортировка

cannot download mbox with python

От
Pierre Forstmann
Дата:
Hello,

I'm trying to download 
with following code using my postgresql.org account:

    print(url + '... ')
    response = requests.get(url, auth=('xxx','yyy'))
    print('status: ' + str(response.status_code))
    print('... done')
    print(response.text)


But I get :

https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312...
status: 200
... done
<!doctype html>
<html lang="en">
 <head>
  <title>PostgreSQL: </title>
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8" />
   
  <meta name="theme-color" content="#336791"/>
  <meta name="copyright" content="The PostgreSQL Global Development Group" />
  <link href="/media/css/fontawesome.css?97a426bd" rel="stylesheet">
  <link rel="stylesheet" href="/media/css/bootstrap-4.4.1.min.css">
  <link rel="shortcut icon" href="/favicon.ico" />
 
  <link rel="stylesheet" type="text/css" href="/dyncss/base.css?97a426bd">

  <script src="/media/js/theme.js?97a426bd"></script>

 
  </head>
  <body>
    <div class="container-fluid">
      <div class="row justify-content-md-center">
        <div class="col">
          <!-- Header -->
          <nav class="navbar navbar-expand-lg navbar-light bg-light">
            <a class="navbar-brand p-0" href="/">
              <img class="logo" src="/media/img/about/press/elephant.png" alt="PostgreSQL Elephant Logo">
            </a>
            <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#pgNavbar" aria-controls="pgNavbar" aria-expanded="false" aria-label="Toggle navigation">
              <span class="navbar-toggler-icon"></span>
            </button>
            <div class="collapse navbar-collapse" id="pgNavbar">
              <ul class="navbar-nav mr-auto">
                <li class="nav-item p-2"><a href="/" title="Home">Home</a></li>
                <li class="nav-item p-2"><a href="/about/" title="About">About</a></li>
                <li class="nav-item p-2"><a href="/download/" title="Download">Download</a></li>
                <li class="nav-item p-2"><a href="/docs/" title="Documentation">Documentation</a></li>
                <li class="nav-item p-2"><a href="/community/" title="Community">Community</a></li>
                <li class="nav-item p-2"><a href="/developer/" title="Developers">Developers</a></li>
                <li class="nav-item p-2"><a href="/support/" title="Support">Support</a></li>
                <li class="nav-item p-2"><a href="/about/donate/" title="Donate">Donate</a></li>
                <li class="nav-item p-2"><a href="/account/" title="Your account">Your account</a></li>
              </ul>
              <form role="search" method="get" action="/search/">
                <div class="input-group">
                  <input id="q" name="q" type="text" size="20" maxlength="255" accesskey="s"  class="form-control" placeholder="Search for...">
                  <span class="input-group-btn">
                    <button class="btn btn-default" type="submit"><i class="fas fa-search"></i></button>
                  </span>
                </div><!-- /input-group -->
              </form>
              <form id="form-theme" class="form-inline d-none">
                <button id="btn-theme" class="btn btn-default ml-1" type="button"></button>
              </form>
            </div>
          </nav>
        </div>
      </div>
      <div class="row justify-content-center pg-shout-box">
        <div class="col text-white text-center">9th November 2023: <a href="/about/news/postgresql-161-155-1410-1313-1217-and-1122-released-2749/">
  PostgreSQL 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22 Released!
</a>

</div>
      </div>
    </div>
   
<div class="container-fluid margin">
  <div class="row">
    <div class="col-lg-2">
      <div id="pgSideWrap">
       
      </div> <!-- pgSideWrap -->
    </div>
    <div class="col-lg-10">
      <div id="pgContentWrap">
       
<h1>Sign in <i class="fas fa-sign-in-alt"></i></h1>
<p>

The website you are trying to log in to (List archives) is using the
postgresql.org community login system. In this system you create a
central account that is used to log into most postgresql.org services.
Once you are logged into this account, you will automatically be
logged in to the associated postgresql.org services.

</p>
<p>
If you do not already have an account,
you can either <a href="/account/signup/">create</a>
a dedicated account, or use one of the third party sign-in systems below.
</p>

<h2>Community account sign-in</h2>
<p>
If you have a postgresql.org community account with a password, please
use the form below to sign in. If you have one but have lost your
password, you can use the <a href="/account/reset/">password reset</a> form.
</p>


<form action="." method="post" id="login-form"><input type="hidden" name="csrfmiddlewaretoken" value="ZbN9nbSTSpSGxNueOJvp06wbuCTIVUeJuQfvb2e5VdjrhTZg6TZa1O7Atsd1a3vr">
  <div class="form-group">
    <input type="text" class="form-control" name="username" id="id_username" placeholder="Username or email address" autofocus />
  </div>
  <div class="form-group">
    <input type="password" class="form-control" name="password" id="id_password" placeholder="Password"/>
    <input type="hidden" name="this_is_the_login_form" value="1" />
    <input type="hidden" name="next" value="/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw==" />
  </div>
  <div class="submit-row">
    <input class="btn btn-primary" type="submit" value="Community Sign-In">
  </div>
</form>


<h2>Third party sign in</h2>

<p><a href="/account/login/facebook/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img src="/media/img/misc/btn_login_facebook.png" alt="Sign in with Facebook" /></a></p>

<p><a href="/account/login/github/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img src="/media/img/misc/btn_login_github.png" alt="Sign in with Github" /></a></p>

<p><a href="/account/login/google/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img src="/media/img/misc/btn_login_google.png" alt="Sign in with Google" /></a></p>

<p><a href="/account/login/microsoft/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img src="/media/img/misc/btn_login_microsoft.png" alt="Sign in with Microsoft" /></a></p>




      </div> <!-- pgContentWrap -->
    </div>
  </div>
</div>

    <!-- Footer -->
    <footer id="footer">
      <div class="container">
        <div class="row">
          <div class="col-md-12">
            <ul>
              <li><a href="https://twitter.com/postgresql"><img src="/media/img/atpostgresql.png" alt="@postgresql"></a></li>
              <li><a href="https://git.postgresql.org/gitweb/?p=postgresql.git"><img src="/media/img/git.png" alt="Git"></a></li>
            </ul>
          </div>
        </div>
      </div>
      <!-- Copyright -->
      <div class="container">
        <a href="/about/policies/">Policies</a> |
        <a href="/about/policies/coc/">Code of Conduct</a> |
        <a href="/about/">About PostgreSQL</a> |
        <a href="/about/contact/">Contact</a><br/>
        <p>Copyright &copy; 1996-2023 The PostgreSQL Global Development Group</p>
      </div>
    </footer>
    <script src="/media/js/jquery-3.4.1.slim.min.js"></script>
    <script src="/media/js/popper-1.16.0.min.js"></script>
    <script src="/media/js/bootstrap-4.4.1.min.js"></script>
    <script src="/media/js/main.js?97a426bd"></script>

  </body>
</html>




I don't understand what is wrong here: I get status 200 but the HTML response says that I must use the community account which I'm actually using ?
 

Thanks

Re: cannot download mbox with python

От
Daniel Gustafsson
Дата:
> On 7 Dec 2023, at 16:34, Pierre Forstmann <pierre.forstmann@gmail.com> wrote:
>
> Hello,
>
> I'm trying to download
> https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312
> with following code using my postgresql.org account:
>
>     print(url + '... ')
>     response = requests.get(url, auth=('xxx','yyy'))

I'm not very well versed in Python, but isn't this for doing plain HTTP auth?
The postgresql.org account does not support http auth, you need to login and
create a session.

--
Daniel Gustafsson




Re: cannot download mbox with python

От
Pierre Forstmann
Дата:
I've tried this:

import requests
from urllib.parse import urlparse

url = 'https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312'
#response = requests.get(url, auth=('xxx','yyy'))
session = requests.session()
session.auth = ('xxx','yyy')
response = session.get(url)
print('status: ' + str(response.status_code))
print('... done')
print(response.content)

But I've have same behaviour:

status: 200
... done
b'<!doctype html>\n<html lang="en">\n <head>\n  <title>PostgreSQL: </title>\n  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">\n  <meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8" />\n   \n  <meta name="theme-color" content="#336791"/>\n  <meta name="copyright" content="The PostgreSQL Global Development Group" />\n  <link href="/media/css/fontawesome.css?97a426bd" rel="stylesheet">\n  <link rel="stylesheet" href="/media/css/bootstrap-4.4.1.min.css">\n  <link rel="shortcut icon" href="/favicon.ico" />\n  \n  <link rel="stylesheet" type="text/css" href="/dyncss/base.css?97a426bd">\n\n  <script src="/media/js/theme.js?97a426bd"></script>\n\n  \n  </head>\n  <body>\n    <div class="container-fluid">\n      <div class="row justify-content-md-center">\n        <div class="col">\n          <!-- Header -->\n          <nav class="navbar navbar-expand-lg navbar-light bg-light">\n            <a class="navbar-brand p-0" href="/">\n              <img class="logo" src="/media/img/about/press/elephant.png" alt="PostgreSQL Elephant Logo">\n            </a>\n            <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#pgNavbar" aria-controls="pgNavbar" aria-expanded="false" aria-label="Toggle navigation">\n              <span class="navbar-toggler-icon"></span>\n            </button>\n            <div class="collapse navbar-collapse" id="pgNavbar">\n              <ul class="navbar-nav mr-auto">\n                <li class="nav-item p-2"><a href="/" title="Home">Home</a></li>\n                <li class="nav-item p-2"><a href="/about/" title="About">About</a></li>\n                <li class="nav-item p-2"><a href="/download/" title="Download">Download</a></li>\n                <li class="nav-item p-2"><a href="/docs/" title="Documentation">Documentation</a></li>\n                <li class="nav-item p-2"><a href="/community/" title="Community">Community</a></li>\n                <li class="nav-item p-2"><a href="/developer/" title="Developers">Developers</a></li>\n                <li class="nav-item p-2"><a href="/support/" title="Support">Support</a></li>\n                <li class="nav-item p-2"><a href="/about/donate/" title="Donate">Donate</a></li>\n                <li class="nav-item p-2"><a href="/account/" title="Your account">Your account</a></li>\n              </ul>\n              <form role="search" method="get" action="/search/">\n                <div class="input-group">\n                  <input id="q" name="q" type="text" size="20" maxlength="255" accesskey="s"  class="form-control" placeholder="Search for...">\n                  <span class="input-group-btn">\n                    <button class="btn btn-default" type="submit"><i class="fas fa-search"></i></button>\n                  </span>\n                </div><!-- /input-group -->\n              </form>\n              <form id="form-theme" class="form-inline d-none">\n                <button id="btn-theme" class="btn btn-default ml-1" type="button"></button>\n              </form>\n            </div>\n          </nav>\n        </div>\n      </div>\n      <div class="row justify-content-center pg-shout-box">\n        <div class="col text-white text-center">9th November 2023: <a href="/about/news/postgresql-161-155-1410-1313-1217-and-1122-released-2749/">\n  PostgreSQL 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22 Released!\n</a>\n\n</div>\n      </div>\n    </div>\n    \n<div class="container-fluid margin">\n  <div class="row">\n    <div class="col-lg-2">\n      <div id="pgSideWrap">\n       \n      </div> <!-- pgSideWrap -->\n    </div>\n    <div class="col-lg-10">\n      <div id="pgContentWrap">\n        \n<h1>Sign in <i class="fas fa-sign-in-alt"></i></h1>\n<p>\n\nThe website you are trying to log in to (List archives) is using the\npostgresql.org community login system. In this system you create a\ncentral account that is used to log into most postgresql.org services.\nOnce you are logged into this account, you will automatically be\nlogged in to the associated postgresql.org services.\n\n</p>\n<p>\nIf you do not already have an account,\nyou can either <a href="/account/signup/">create</a>\na dedicated account, or use one of the third party sign-in systems below.\n</p>\n\n<h2>Community account sign-in</h2>\n<p>\nIf you have a postgresql.org community account with a password, please\nuse the form below to sign in. If you have one but have lost your\npassword, you can use the <a href="/account/reset/">password reset</a> form.\n</p>\n\n\n<form action="." method="post" id="login-form"><input type="hidden" name="csrfmiddlewaretoken" value="74NVUzwZ2xyfKCjEKU55vymmVDYvbgwiZHKhjCpIvdoYT7zPBvHYk9KOcVXgJng3">\n  <div class="form-group">\n    <input type="text" class="form-control" name="username" id="id_username" placeholder="Username or email address" autofocus />\n  </div>\n  <div class="form-group">\n    <input type="password" class="form-control" name="password" id="id_password" placeholder="Password"/>\n    <input type="hidden" name="this_is_the_login_form" value="1" />\n    <input type="hidden" name="next" value="/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw==" />\n  </div>\n  <div class="submit-row">\n    <input class="btn btn-primary" type="submit" value="Community Sign-In">\n  </div>\n</form>\n\n\n<h2>Third party sign in</h2>\n\n<p><a href="/account/login/facebook/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img src="/media/img/misc/btn_login_facebook.png" alt="Sign in with Facebook" /></a></p>\n\n<p><a href="/account/login/github/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img src="/media/img/misc/btn_login_github.png" alt="Sign in with Github" /></a></p>\n\n<p><a href="/account/login/google/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img src="/media/img/misc/btn_login_google.png" alt="Sign in with Google" /></a></p>\n\n<p><a href="/account/login/microsoft/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img src="/media/img/misc/btn_login_microsoft.png" alt="Sign in with Microsoft" /></a></p>\n\n\n\n\n      </div> <!-- pgContentWrap -->\n    </div>\n  </div>\n</div>\n\n    <!-- Footer -->\n    <footer id="footer">\n      <div class="container">\n        <div class="row">\n          <div class="col-md-12">\n            <ul>\n              <li><a href="https://twitter.com/postgresql"><img src="/media/img/atpostgresql.png" alt="@postgresql"></a></li>\n              <li><a href="https://git.postgresql.org/gitweb/?p=postgresql.git"><img src="/media/img/git.png" alt="Git"></a></li>\n            </ul>\n          </div>\n        </div>\n      </div>\n      <!-- Copyright -->\n      <div class="container">\n        <a href="/about/policies/">Policies</a> |\n        <a href="/about/policies/coc/">Code of Conduct</a> |\n        <a href="/about/">About PostgreSQL</a> |\n        <a href="/about/contact/">Contact</a><br/>\n        <p>Copyright &copy; 1996-2023 The PostgreSQL Global Development Group</p>\n      </div>\n    </footer>\n    <script src="/media/js/jquery-3.4.1.slim.min.js"></script>\n    <script src="/media/js/popper-1.16.0.min.js"></script>\n    <script src="/media/js/bootstrap-4.4.1.min.js"></script>\n    <script src="/media/js/main.js?97a426bd"></script>\n\n  </body>\n</html>\n'

Thanks

Le jeu. 7 déc. 2023 à 17:06, Daniel Gustafsson <daniel@yesql.se> a écrit :
> On 7 Dec 2023, at 16:34, Pierre Forstmann <pierre.forstmann@gmail.com> wrote:
>
> Hello,
>
> I'm trying to download
> https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312
> with following code using my postgresql.org account:
>
>     print(url + '... ')
>     response = requests.get(url, auth=('xxx','yyy'))

I'm not very well versed in Python, but isn't this for doing plain HTTP auth?
The postgresql.org account does not support http auth, you need to login and
create a session.

--
Daniel Gustafsson

Re: cannot download mbox with python

От
Andreas 'ads' Scherbaum
Дата:
On 07/12/2023 17:39, Pierre Forstmann wrote:
I've tried this:

import requests
from urllib.parse import urlparse

url = 'https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312'
#response = requests.get(url, auth=('xxx','yyy'))
session = requests.session()
session.auth = ('xxx','yyy')
response = session.get(url)
print('status: ' + str(response.status_code))
print('... done')
print(response.content)

The session.auth is still doing a basic http auth, not what you need here.

Try opening your link in a browser in an anonymous window:

https://www.postgresql.org/list/pgsql-bugs/mbox/pgsql-bugs.202312

It redirects you to the login, you need to emulate that path in your script,
login into the website and then you can retrieve the mbox.

-- 				Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project