Обсуждение: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

Поиск
Список
Период
Сортировка

Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Shaozhong SHI
Дата:

Has anyone tested this one?

I tried psql_insert_copy method, but I got the following error message.

to_sql() got an unexpected keyword argument 'method'

Can anyone shed light on this?

Regards,

David

Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Adrian Klaver
Дата:
On 10/4/21 8:44 AM, Shaozhong SHI wrote:
> 
> Has anyone tested this one?
> A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis 
> Valentiner 
> <https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>
> 
> I tried psql_insert_copy method, but I got the following error message.
> 
> to_sql() got an unexpected keyword argument 'method'
> 
> 
> Can anyone shed light on this?

method is a legitimate keyword:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html?highlight=to_sql#pandas.DataFrame.to_sql

So there must be something about how you used it.

Pandas version?

The actual code you used when the error occurred?


> 
> 
> Regards,
> 
> 
> David
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Shaozhong SHI
Дата:
Hello, Adrian Klaver,

Pandas version is 0.23.0.

I used the following code:

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)
        
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
    df.to_sql('test1', engine, schema='public', if_exists='append', index=False, method=psql_insert_copy) 

I could not find obvious reasons.

Regards,

David

On Mon, 4 Oct 2021 at 17:06, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/4/21 8:44 AM, Shaozhong SHI wrote:
>
> Has anyone tested this one?
> A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
> Valentiner
> <https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>
>
> I tried psql_insert_copy method, but I got the following error message.
>
> to_sql() got an unexpected keyword argument 'method'
>
>
> Can anyone shed light on this?

method is a legitimate keyword:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html?highlight=to_sql#pandas.DataFrame.to_sql

So there must be something about how you used it.

Pandas version?

The actual code you used when the error occurred?


>
>
> Regards,
>
>
> David
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Rob Sargent
Дата:


On Oct 4, 2021, at 10:20 AM, Shaozhong SHI <shishaozhong@gmail.com> wrote:

Hello, Adrian Klaver,

Pandas version is 0.23.0.

I used the following code:

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)
        
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
    df.to_sql('test1', engine, schema='public', if_exists='append', index=False, method=psql_insert_copy) 

you need to quote ‘psql_insert_copy'

Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Adrian Klaver
Дата:
On 10/4/21 9:20 AM, Shaozhong SHI wrote:
> Hello, Adrian Klaver,
> 
> Pandas version is 0.23.0.

The reason the below does not work is method did not show up until 
pandas 0.24.0.

> 
> I used the following code:
> 
> def psql_insert_copy(table, conn, keys, data_iter):
>      # gets a DBAPI connection that can provide a cursor
>      dbapi_conn = conn.connection
>      with dbapi_conn.cursor() as cur:
>          s_buf = StringIO()
>          writer = csv.writer(s_buf)
>          writer.writerows(data_iter)
>          s_buf.seek(0)
> 
>          columns = ', '.join('"{}"'.format(k) for k in keys)
>          if table.schema:
>              table_name = '{}.{}'.format(table.schema, table.name 
> <http://table.name>)
>          else:
>              table_name = table.name <http://table.name>
> 
>          sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
>              table_name, columns)
>          cur.copy_expert(sql=sql, file=s_buf)
> engine = create_engine('postgresql+psycopg2://:5432/postgres')
> try:
>      df.to_sql('test1', engine, schema='public', if_exists='append', 
> index=False, method=psql_insert_copy)
> 
> I could not find obvious reasons.


> 
> Regards,
> 
> David
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Shaozhong SHI
Дата:
Hello, Adrian Klaver,
What is the robust way to upgrade Pandas?
Regards,
David

On Monday, 4 October 2021, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/4/21 9:20 AM, Shaozhong SHI wrote:
Hello, Adrian Klaver,

Pandas version is 0.23.0.

The reason the below does not work is method did not show up until pandas 0.24.0.


I used the following code:

def psql_insert_copy(table, conn, keys, data_iter):
     # gets a DBAPI connection that can provide a cursor
     dbapi_conn = conn.connection
     with dbapi_conn.cursor() as cur:
         s_buf = StringIO()
         writer = csv.writer(s_buf)
         writer.writerows(data_iter)
         s_buf.seek(0)

         columns = ', '.join('"{}"'.format(k) for k in keys)
         if table.schema:
             table_name = '{}.{}'.format(table.schema, table.name <http://table.name>)
         else:
             table_name = table.name <http://table.name>

         sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
             table_name, columns)
         cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
     df.to_sql('test1', engine, schema='public', if_exists='append', index=False, method=psql_insert_copy)

I could not find obvious reasons.



Regards,

David



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Adrian Klaver
Дата:
On 10/4/21 10:10 AM, Shaozhong SHI wrote:
> Hello, Adrian Klaver,
> What is the robust way to upgrade Pandas?

Carefully.

The most recent version is 1.3.3, which is approximately 5 versions 
ahead of where you are now. The big jump is when Pandas went from 0.25 
to 1.0. See docs here:

https://pandas.pydata.org/docs/whatsnew/v1.0.0.html?highlight=upgrade

So the process should be 0.24 -> 0.25, verify, 0.25 -> 1.0, verify. Then 
on to wherever you want to end up a step at a time.

Before each step spend time here:

https://pandas.pydata.org/docs/whatsnew/

to see what the gotcha's are.


> Regards,
> David
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Testing of a fast method to bulk insert a Pandas DataFrame into Postgres

От
Adrian Klaver
Дата:
On 10/4/21 10:28 AM, Adrian Klaver wrote:
> On 10/4/21 10:10 AM, Shaozhong SHI wrote:
>> Hello, Adrian Klaver,
>> What is the robust way to upgrade Pandas?
> 
> Carefully.
> 
> The most recent version is 1.3.3, which is approximately 5 versions 
> ahead of where you are now. The big jump is when Pandas went from 0.25 
> to 1.0. See docs here:
> 
> https://pandas.pydata.org/docs/whatsnew/v1.0.0.html?highlight=upgrade
> 
> So the process should be 0.24 -> 0.25, verify, 0.25 -> 1.0, verify. Then 
> on to wherever you want to end up a step at a time.
> 
> Before each step spend time here:
> 
> https://pandas.pydata.org/docs/whatsnew/
> 
> to see what the gotcha's are.

Should have added:

If you are not already working in a virtualenv it would be a good idea 
to do the above in one or more.

> 
> 
>> Regards,
>> David
>>
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com