Обсуждение: COPY in Java?

Поиск
Список
Период
Сортировка

COPY in Java?

От
John R Pierce
Дата:
I'm looking at the JDBC documentation at
https://jdbc.postgresql.org/documentation/head/index.html  and not
seeing any mention of COPY and any semantics for using COPY FROM STDIN
or COPY TO STDOUT...  google has a link to a 404 page with some sort of
CopyManager

we may need to develop a specialized sort of sql loader to replace an
existing Oracle tool, we need to intercept bad rows and put them in an
exceptions file, while bulk loading the good data.   I'm thinking we
batch N rows into COPY, and if a row faults, put it in the exception
pool then retry the batch without it.   the source data needs to be
massaged before it can be fed to copy, so we can't use COPY FROM
filename ...



--
john r pierce, recycling bits in santa cruz



Re: COPY in Java?

От
Thomas Kellerer
Дата:
John R Pierce schrieb am 18.07.2016 um 22:34:
> I'm looking at the JDBC documentation at
> https://jdbc.postgresql.org/documentation/head/index.html  and not
> seeing any mention of COPY and any semantics for using COPY FROM
> STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
> sort of CopyManager

https://jdbc.postgresql.org/documentation/publicapi/index.html

indeed returns a 404

> we may need to develop a specialized sort of sql loader to replace an
> existing Oracle tool, we need to intercept bad rows and put them in
> an exceptions file, while bulk loading the good data.   I'm thinking
> we batch N rows into COPY, and if a row faults, put it in the
> exception pool then retry the batch without it.   the source data
> needs to be massaged before it can be fed to copy, so we can't use
> COPY FROM filename ...

The CopyManager is actually quite easy to use:

   org.postgresql.core.BaseConnection con = (BaseConnection)DriverManager.getConnection(....);
   CopyManager mgr = new CopyManager(con);

   Reader in = new BuffereReader(new FileReader("..."));
   mgr.copyIn("copy target_table from stdin with (...)", in);

But I don't think it's suitable for what you need to do because:

* The input file needs to be formatted so that COPY can handle it.
* there is no way you can catch the bad rows with that.

I assume the first problem could be solved by implementing your own Reader that "massages" each line while it reads the
sourcefile. 

But I don't think there is an efficient way to "catch" the bad rows through the Copy API.


But you might be interested in http://pgloader.io/ which is a tool similar to SQL*Loader it supports more formats then
COPYdoes. 
I have not worked with it however

Thomas



Re: COPY in Java?

От
Sam Gendler
Дата:


On Mon, Jul 18, 2016 at 2:16 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:


I assume the first problem could be solved by implementing your own Reader that "massages" each line while it reads the source file.

But I don't think there is an efficient way to "catch" the bad rows through the Copy API.

I've done exactly this.  A long time ago, but so long as you don't mind redoing the entire COPY without the bad rows, it works just fine. If you get lots of failures in any given batch, you'll be better off doing individual inserts in large transactions and catching the failures there, or writing a more sophisticated reader which catches the bad rows and prevents them from even being attempted.

 

But you might be interested in http://pgloader.io/ which is a tool similar to SQL*Loader it supports more formats then COPY does.
I have not worked with it however

Is Kettle still around (java-based ETL tool)?  It always had its difficulties, in my experience, but for basic bulk inserts with redirection of bad rows to a log, it works very well.  Not the fastest to execute relative to implementing a custom COPY reader, but if you aren't already dealing with overloaded infrastructure and don't need to absolutely maximize efficiency, it will get you up and running really quickly (once you are familiar with the tool and over the learning curve, of course) and isn't database-specific, so much more portable. I found it really easy to write Java/Groovy/jython code which could instantiate the kettle tools and run them in the context of other tools/scripts, making it quite modular.  It was all a long time ago or I'd provide more detail.  Sorry. And none of that should discredit the pgloader.io suggestion.  I just have no experience of it.

--sam

Re: COPY in Java?

От
Thomas Kellerer
Дата:
John R Pierce schrieb am 18.07.2016 um 22:34:
> I'm looking at the JDBC documentation at
> https://jdbc.postgresql.org/documentation/head/index.html  and not
> seeing any mention of COPY and any semantics for using COPY FROM
> STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
> sort of CopyManager

https://jdbc.postgresql.org/documentation/publicapi/index.html

indeed returns a 404

> we may need to develop a specialized sort of sql loader to replace an
> existing Oracle tool, we need to intercept bad rows and put them in
> an exceptions file, while bulk loading the good data.   I'm thinking
> we batch N rows into COPY, and if a row faults, put it in the
> exception pool then retry the batch without it.   the source data
> needs to be massaged before it can be fed to copy, so we can't use
> COPY FROM filename ...

The CopyManager is actually quite easy to use:

   org.postgresql.core.BaseConnection con = (BaseConnection)DriverManager.getConnection(....);
   CopyManager mgr = new CopyManager(con);

   Reader in = new BuffereReader(new FileReader("..."));
   mgr.copyIn("copy target_table from stdin with (...)", in);

But I don't think it's suitable for what you need to do because:

* The input file needs to be formatted so that COPY can handle it.
* there is no way you can catch the bad rows with that.

I assume the first problem could be solved by implementing your own Reader that "massages" each line while it reads the
sourcefile. 

But I don't think there is an efficient way to "catch" the bad rows through the Copy API.


But you might be interested in http://pgloader.io/ which is a tool similar to SQL*Loader it supports more formats then
COPYdoes. 
I have not worked with it however

Thomas



Re: COPY in Java?

От
John R Pierce
Дата:
On 7/18/2016 2:16 PM, Thomas Kellerer wrote:
John R Pierce schrieb am 18.07.2016 um 22:34:
I'm looking at the JDBC documentation at
https://jdbc.postgresql.org/documentation/head/index.html  and not
seeing any mention of COPY and any semantics for using COPY FROM
STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
sort of CopyManager

https://jdbc.postgresql.org/documentation/publicapi/index.html

indeed returns a 404

so where is the documentation on CopyManager ?


-- 
john r pierce, recycling bits in santa cruz

Re: COPY in Java?

От
Sam Gendler
Дата:


On Mon, Jul 18, 2016 at 2:16 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:


I assume the first problem could be solved by implementing your own Reader that "massages" each line while it reads the source file.

But I don't think there is an efficient way to "catch" the bad rows through the Copy API.

I've done exactly this.  A long time ago, but so long as you don't mind redoing the entire COPY without the bad rows, it works just fine. If you get lots of failures in any given batch, you'll be better off doing individual inserts in large transactions and catching the failures there, or writing a more sophisticated reader which catches the bad rows and prevents them from even being attempted.

 

But you might be interested in http://pgloader.io/ which is a tool similar to SQL*Loader it supports more formats then COPY does.
I have not worked with it however

Is Kettle still around (java-based ETL tool)?  It always had its difficulties, in my experience, but for basic bulk inserts with redirection of bad rows to a log, it works very well.  Not the fastest to execute relative to implementing a custom COPY reader, but if you aren't already dealing with overloaded infrastructure and don't need to absolutely maximize efficiency, it will get you up and running really quickly (once you are familiar with the tool and over the learning curve, of course) and isn't database-specific, so much more portable. I found it really easy to write Java/Groovy/jython code which could instantiate the kettle tools and run them in the context of other tools/scripts, making it quite modular.  It was all a long time ago or I'd provide more detail.  Sorry. And none of that should discredit the pgloader.io suggestion.  I just have no experience of it.

--sam

Re: COPY in Java?

От
John R Pierce
Дата:
On 7/18/2016 2:16 PM, Thomas Kellerer wrote:
John R Pierce schrieb am 18.07.2016 um 22:34:
I'm looking at the JDBC documentation at
https://jdbc.postgresql.org/documentation/head/index.html  and not
seeing any mention of COPY and any semantics for using COPY FROM
STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
sort of CopyManager

https://jdbc.postgresql.org/documentation/publicapi/index.html

indeed returns a 404

so where is the documentation on CopyManager ?


-- 
john r pierce, recycling bits in santa cruz

Re: COPY in Java?

От
Radoslav Petrov
Дата:

On 19.07.2016 05:11, John R Pierce wrote:
> On 7/18/2016 2:16 PM, Thomas Kellerer wrote:
>> John R Pierce schrieb am 18.07.2016 um 22:34:
>>> I'm looking at the JDBC documentation at
>>> https://jdbc.postgresql.org/documentation/head/index.html and not
>>> seeing any mention of COPY and any semantics for using COPY FROM
>>> STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
>>> sort of CopyManager
>>
>> https://jdbc.postgresql.org/documentation/publicapi/index.html
>>
>> indeed returns a 404
>
> so where is the documentation on CopyManager ?
>
I'm also getting HTTP 404 when I try to access
https://jdbc.postgresql.org/documentation/publicapi/index.html.
The cited URL should contain the JavaDocs. You can get the archive from
the Maven Central repository. This is the link for the latest version:
http://central.maven.org/maven2/org/postgresql/postgresql/9.4.1209/postgresql-9.4.1209-javadoc.jar
Directory browsing is enabled so if this is not the version you are
using, you can browse to the directory with the appropriate version.

--
Поздрави,       | Best regards,
Радослав Петров | Radoslav Petrov



Re: COPY in Java?

От
Radoslav Petrov
Дата:

On 19.07.2016 05:11, John R Pierce wrote:
> On 7/18/2016 2:16 PM, Thomas Kellerer wrote:
>> John R Pierce schrieb am 18.07.2016 um 22:34:
>>> I'm looking at the JDBC documentation at
>>> https://jdbc.postgresql.org/documentation/head/index.html and not
>>> seeing any mention of COPY and any semantics for using COPY FROM
>>> STDIN or COPY TO STDOUT...  google has a link to a 404 page with some
>>> sort of CopyManager
>>
>> https://jdbc.postgresql.org/documentation/publicapi/index.html
>>
>> indeed returns a 404
>
> so where is the documentation on CopyManager ?
>
I'm also getting HTTP 404 when I try to access
https://jdbc.postgresql.org/documentation/publicapi/index.html.
The cited URL should contain the JavaDocs. You can get the archive from
the Maven Central repository. This is the link for the latest version:
http://central.maven.org/maven2/org/postgresql/postgresql/9.4.1209/postgresql-9.4.1209-javadoc.jar
Directory browsing is enabled so if this is not the version you are
using, you can browse to the directory with the appropriate version.

--
Поздрави,       | Best regards,
Радослав Петров | Radoslav Petrov