Обсуждение: COPY to/from non-local file

Поиск
Список
Период
Сортировка

COPY to/from non-local file

От
Jaime Silvela
Дата:
I've been looking around for this functionality:
Is it possible to use COPY with a file that doesn't reside in the DB's
filesystem? I know there is sort of a solution in using COPY from stdin/
COPY to stdout, however that depends on calling the psql command, which
breaks the flow of control of the programs I'm trying to write.

Here's the story: I have a Ruby on Rails application which sits on
server A, and connects to a Postgres Database running on B. Using the
regular flow, the rails code writes into various tables, and then
generates a table that needs to be exported into a file which will be
used as input into a separate program, a simulator, and kept in the
filesystem of the Rails server. Using the COPY command would entail
either of
a) Using COPY to generate the file on server B, then transfer to A - but
how? ftp?  I want to reduce this type of coupling
b) Using COPY TO STDOUT from psql, called in the Rails code with a
backtick, then gathering the output  and filing it. - but this solution
depends on having psql in the path of the Rails server, reintroducing
the server credentials, and from a programming point of view is kind of
ugly.
c) The Postgres driver for Rails tries to give an interface to the COPY
command using "raw_connection", "getline" and "endcopy", but it's quite
brittle, so I'm avoiding it altogether.

At the moment I'm avoiding those solutions, and instead get the table
into Rails space with a CURSOR on a SELECT, then simply write the file
in Rails, one line at a time. However, the tables I'm writing are pretty
big, and  the performance is so much worse than with COPY...

Any suggestions?
Thanks
Jaime


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

Re: COPY to/from non-local file

От
Ted Byers
Дата:
I just checked the manual and it says nothing about being able to use the SQL copy to access network files. 
 
But you have an option that almost works for you.  I am sure others here will have some suggestions, but if I were in your place, I'd gather more information about where bottlenecks exist in a solution that works for me, albeit slowly.
 
To check your performance problem, you may want to add benchmarking code to your Ruby program to see where it is spending its time: on getting the data from PostgreSQL or writing it to a file.  That will tell you where your efforts are best spent.
 
I don't work with Ruby, but if I was using C++ or Java or Perl, I'd use the same strategy: get the data en mass, and then write it out as efficiently as possible.  In C++ for example, there is an iostream class based idiom that using one line of code in the application program and that line is as fast as you can make your io code without delving deeply into IO processing and developing your own IOstream classes.  In both C++ and Java, you have stream classes, and significant control over how the data is written: one character at a time (if you want the process to take forever ;-) or in blocks of whatever size you want.  But this involves being willing to develop your own stream classes to implement your preferred buffering strategy.  In C++, you can save a lot of development time by having template policy classes that control how best to optimize disk IO.  In Perl, you can read a file en mass and then iterate through it a line at a time, but for this my preference at present is to use C++.  Since I don't know Ruby, I don't know how well it supports these, and related, IO programming idioms.  If it's support is poor, it may pay to use a more powerful and flexible language.  If it has outstanding power and flexibility for supporting IO optimization programming idioms, let me know and I'll invest the time to add Ruby to my list of languages in my development toolbox.  :-)  I would assume that what I have learned using the languages I know can be applied in some fashion to programming in Ruby.
 
HTH
 
Ted

Jaime Silvela <JSilvela@Bear.com> wrote:
I've been looking around for this functionality:
Is it possible to use COPY with a file that doesn't reside in the DB's
filesystem? I know there is sort of a solution in using COPY from stdin/
COPY to stdout, however that depends on calling the psql command, which
breaks the flow of control of the programs I'm trying to write.

Here's the story: I have a Ruby on Rails application which sits on
server A, and connects to a Postgres Database running on B. Using the
regular flow, the rails code writes into various tables, and then
generates a table that needs to be exported into a file which will be
used as input into a separate program, a simulator, and kept in the
filesystem of the Rails server. Using the COPY command would entail
either of
a) Using COPY to generate the file on server B, then transfer to A - but
how? ftp? I want to reduce this type of coupling
b) Using COPY TO STDOUT from psql, called in the Rails code with a
backtick, then gathering the output and filing it. - but this solution
depends on having psql in the path of the Rails server, reintroducing
the server credentials, and from a programming point of view is kind of
ugly.
c) The Postgres driver for Rails tries to give an interface to the COPY
command using "raw_connection", "getline" and "endcopy", but it's quite
brittle, so I'm avoiding it altogether.

At the moment I'm avoiding those solutions, and instead get the table
into Rails space with a CURSOR on a SELECT, then simply write the file
in Rails, one line at a time. However, the tables I'm writing are pretty
big, and the performance is so much worse than with COPY...

Any suggestions?
Thanks
Jaime


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice. You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of: (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: COPY to/from non-local file

От
Erik Jones
Дата:
On Jun 27, 2007, at 9:32 AM, Jaime Silvela wrote:

> I've been looking around for this functionality:
> Is it possible to use COPY with a file that doesn't reside in the
> DB's filesystem? I know there is sort of a solution in using COPY
> from stdin/ COPY to stdout, however that depends on calling the
> psql command, which breaks the flow of control of the programs I'm
> trying to write.
>
> Here's the story: I have a Ruby on Rails application which sits on
> server A, and connects to a Postgres Database running on B. Using
> the regular flow, the rails code writes into various tables, and
> then generates a table that needs to be exported into a file which
> will be used as input into a separate program, a simulator, and
> kept in the filesystem of the Rails server. Using the COPY command
> would entail either of
> a) Using COPY to generate the file on server B, then transfer to A
> - but how? ftp?  I want to reduce this type of coupling
> b) Using COPY TO STDOUT from psql, called in the Rails code with a
> backtick, then gathering the output  and filing it. - but this
> solution depends on having psql in the path of the Rails server,
> reintroducing the server credentials, and from a programming point
> of view is kind of ugly.
> c) The Postgres driver for Rails tries to give an interface to the
> COPY command using "raw_connection", "getline" and "endcopy", but
> it's quite brittle, so I'm avoiding it altogether.
>
> At the moment I'm avoiding those solutions, and instead get the
> table into Rails space with a CURSOR on a SELECT, then simply write
> the file in Rails, one line at a time. However, the tables I'm
> writing are pretty big, and  the performance is so much worse than
> with COPY...
>
> Any suggestions?
> Thanks
> Jaime

The way we usually handle situations similar to this is to use
network mounts of directories that are visible from both servers
using, say, nfs.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: COPY to/from non-local file

От
Martijn van Oosterhout
Дата:
On Wed, Jun 27, 2007 at 10:32:32AM -0400, Jaime Silvela wrote:
> I've been looking around for this functionality:
> Is it possible to use COPY with a file that doesn't reside in the DB's
> filesystem? I know there is sort of a solution in using COPY from stdin/
> COPY to stdout, however that depends on calling the psql command, which
> breaks the flow of control of the programs I'm trying to write.

Sorry? Copy to stdout/from stdin can be done by normal programs also,
not just psql. Execute the command and then use putline/getline to
transfer the data. In perl DBI is works like that, for C also, so
probably from ruby also.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: COPY to/from non-local file

От
Jaime Silvela
Дата:
The problem is that the getline/readline interface, which does exist for
Ruby, doesn't seem to work so well, and anyway operates line by line; I
would have preferred to just hand a file descriptor and be done with it.

Thanks for your suggestions too, Erik and Ted. Performance-wise the best
solution may be a network mount as you suggest Erik.

Thanks,
Jaime

Martijn van Oosterhout wrote:
> On Wed, Jun 27, 2007 at 10:32:32AM -0400, Jaime Silvela wrote:
>
>> I've been looking around for this functionality:
>> Is it possible to use COPY with a file that doesn't reside in the DB's
>> filesystem? I know there is sort of a solution in using COPY from stdin/
>> COPY to stdout, however that depends on calling the psql command, which
>> breaks the flow of control of the programs I'm trying to write.
>>
>
> Sorry? Copy to stdout/from stdin can be done by normal programs also,
> not just psql. Execute the command and then use putline/getline to
> transfer the data. In perl DBI is works like that, for C also, so
> probably from ruby also.
>
> Have a nice day,
>


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

Re: COPY to/from non-local file

От
Enrico Sirola
Дата:
Hi,

Il giorno 27/giu/07, alle ore 20:54, Jaime Silvela ha scritto:

The problem is that the getline/readline interface, which does exist for Ruby, doesn't seem to work so well, and anyway operates line by line; I would have preferred to just hand a file descriptor and be done with it.

Thanks for your suggestions too, Erik and Ted. Performance-wise the best solution may be a network mount as you suggest Erik.

I transfer some million rows (for like 20 tables) every day from a server to another.
The two servers are in two separated networks, with different security levels, so unfortunately
network mounts are not available. At the beginning I used to transfer record-by-record within
a python application hosted by the source server, but it took ages. At the end, 
some scripting around ssh proved to be enough. just put the public key file in
the .ssh/authorized_keys of the postgresql system user and you are done. You can transfer 
files without the need for a password, and it's perfectly scriptable
I hope it helps,


Re: COPY to/from non-local file

От
Martijn van Oosterhout
Дата:
On Wed, Jun 27, 2007 at 02:54:05PM -0400, Jaime Silvela wrote:
> The problem is that the getline/readline interface, which does exist for
> Ruby, doesn't seem to work so well, and anyway operates line by line; I
> would have preferred to just hand a file descriptor and be done with it.

Hrm, in what sense doesn't it work well? Line-by-line means
record-by-record. And writing a function to take an fd and do the work
would be straightforward, or do you mean something else?

Do you have any suggestions for improvement?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: COPY to/from non-local file

От
Jaime Silvela
Дата:
Well, for Ruby there are two Postgres drivers, a native compiled one I
run in Linux, and a pure Ruby one that I use in Windows. The PGconn
object in the Windows version doesn't seem to offer the methods putline,
readline or endcopy. The Linux version does, but since I do development
in both Linux and Windows, this is
not a solution.

It would have been great to have a COPY to/from string functionality.
I'm sure there's a rationale for not having that. Would it be foolish to
try to implement that? In my opinion, it would be cleaner design than
having to mess with getline/putline or file transmission.

Thanks also Enrico, your idea is good.


Martijn van Oosterhout wrote:
> On Wed, Jun 27, 2007 at 02:54:05PM -0400, Jaime Silvela wrote:
>
>> The problem is that the getline/readline interface, which does exist for
>> Ruby, doesn't seem to work so well, and anyway operates line by line; I
>> would have preferred to just hand a file descriptor and be done with it.
>>
>
> Hrm, in what sense doesn't it work well? Line-by-line means
> record-by-record. And writing a function to take an fd and do the work
> would be straightforward, or do you mean something else?
>
> Do you have any suggestions for improvement?
>
> Have a nice day,
>


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************