Обсуждение: Import from CSV - Questions

Поиск
Список
Период
Сортировка

Import from CSV - Questions

От
"Joel Hainley"
Дата:
Hello all,

I was looking over the todo list and noticed an item "import from CSV".

I was wondering if there was more information about what the thoughts were behind this item?

- Are you looking to be able to import a csv file into a single table?
- Multiple tables?
- Any examples exist out there for other dbms tools that this idea was inspired by?

Obviously i'm new to the project but i'd like to try to dig in and give a hand, and thought this might be something I could lend a hand with.

Thanks

joel hainley



Re: Import from CSV - Questions

От
"Dave Page"
Дата:
 


From: pgadmin-hackers-owner@postgresql.org [mailto:pgadmin-hackers-owner@postgresql.org] On Behalf Of Joel Hainley
Sent: 09 March 2006 21:43
To: pgadmin-hackers@postgresql.org
Subject: [pgadmin-hackers] Import from CSV - Questions

Hello all,

I was looking over the todo list and noticed an item "import from CSV".

I was wondering if there was more information about what the thoughts were behind this item?

- Are you looking to be able to import a csv file into a single table?
- Multiple tables?
- Any examples exist out there for other dbms tools that this idea was inspired by?
 
To be honest I didn't realise it was there. The tool I have in mind for more general data movement is a simplified version of something like Microsoft's DTS, essentially with input, transform and output modules. The input modules might get data from ODBC or a CSV file for example, the transform module would allow either straight mapping of source to target columns, and (probably) python based transformation for more complex stuff, and the output module would insert the data into PostgreSQL, a CSV file or and ODBC data source etc.
 
That is clearly a complex project though... An alternative might be a simple wizard, activated by a 'load data' option on the table context menu
 
Magnus was also talkng about XML export (and import?) so perhaps he has some ideas.
 
Obviously i'm new to the project but i'd like to try to dig in and give a hand, and thought this might be something I could lend a hand with.
 
Great - we're happy to work with new developers whenever we can.
 
Regards, Dave

Re: Import from CSV - Questions

От
"Magnus Hagander"
Дата:
> To be honest I didn't realise it was there. The tool I have
> in mind for more general data movement is a simplified
> version of something like Microsoft's DTS, essentially with
> input, transform and output modules. The input modules might
> get data from ODBC or a CSV file for example, the transform
> module would allow either straight mapping of source to
> target columns, and (probably) python based transformation
> for more complex stuff, and the output module would insert
> the data into PostgreSQL, a CSV file or and ODBC data source etc.
>
> That is clearly a complex project though... An alternative
> might be a simple wizard, activated by a 'load data' option
> on the table context menu
>
> Magnus was also talkng about XML export (and import?) so
> perhaps he has some ideas.

Yeah, I've got some very sketchy design and code started around
somewhere. Nothing ready to be looked at yet though, much less something
usable.

(And yes, the design was modularised to plugin thigs like ODBC and
stuff, just in this particular case I needed XML specificaly)

BTW - one important aspect is that the actual transfer part should run
as a separate program and not just inside pgadmin. pgadmin would do the
GUI to set it up only.

//Magnus

Re: Import from CSV - Questions

От
"Joel Hainley"
Дата:
Magnus, any chance of getting a look at what you currently have to help guide me in the appropriate direction?

On 3/10/06, Magnus Hagander < mha@sollentuna.net > wrote:
> To be honest I didn't realise it was there. The tool I have
> in mind for more general data movement is a simplified
> version of something like Microsoft's DTS, essentially with
> input, transform and output modules. The input modules might
> get data from ODBC or a CSV file for example, the transform
> module would allow either straight mapping of source to
> target columns, and (probably) python based transformation
> for more complex stuff, and the output module would insert
> the data into PostgreSQL, a CSV file or and ODBC data source etc.
>
> That is clearly a complex project though... An alternative
> might be a simple wizard, activated by a 'load data' option
> on the table context menu
>
> Magnus was also talkng about XML export (and import?) so
> perhaps he has some ideas.

Yeah, I've got some very sketchy design and code started around
somewhere. Nothing ready to be looked at yet though, much less something
usable.

(And yes, the design was modularised to plugin thigs like ODBC and
stuff, just in this particular case I needed XML specificaly)

BTW - one important aspect is that the actual transfer part should run
as a separate program and not just inside pgadmin. pgadmin would do the
GUI to set it up only.

//Magnus

Re: Import from CSV - Questions

От
"Magnus Hagander"
Дата:
>
> Magnus, any chance of getting a look at what you currently
> have to help guide me in the appropriate direction?

Umm. That would kind of assume it's written down in a comprehensible
way. Which it isn't, of course :-)

The general ideas so far have been, off the top of my head:

* Pluginnable set of "readers" and "writers". Originally I'd see
postgresql, odbc, xml and possibly csv. Pg driver would be optimised to
use COPY when available.

* Pluginnable set of "transforms" that would operate on the rows. By
default things like copy and concatenate and maybe regexp. Future
enhancement would be a python extension, as Dave mentioned. (Or really,
anything else)

* I was envisioning a split of say "package", "job", "step" (terms of
course subject to discussion). package basically a set of job, job a set
of steps. Things like connections would be defined at the "job" level,
along wiht parmaeters for transaction control etc. (So you can use it to
transfer 10 different tables within a single transaction, something I
need all the time).

* I'd like to see the job format stored as XML with a well defined
schema, so different appliations can generate it - both manually
(GUI-wise from pgadmin and phppgadmin etc) and automatically.

* The "engine" should be available both as a commandline tool (which
must not require X libraries etc, because it should be deployable
"everywhere") and as  acommand inside pgadmin (like MS DTS)


Um. I think that's about it. I had some sketches of classes and
interfaces around (not complete, but an idea), but I can't find them :(

//Magnus

Re: Import from CSV - Questions

От
Andreas Pflug
Дата:
Magnus Hagander wrote:
>> Magnus, any chance of getting a look at what you currently
>> have to help guide me in the appropriate direction?
>>
>
> Umm. That would kind of assume it's written down in a comprehensible
> way. Which it isn't, of course :-)
>
> The general ideas so far have been, off the top of my head:
>
> * Pluginnable set of "readers" and "writers". Originally I'd see
> postgresql, odbc, xml and possibly csv. Pg driver would be optimised to
> use COPY when available.
>
> * Pluginnable set of "transforms" that would operate on the rows. By
> default things like copy and concatenate and maybe regexp. Future
> enhancement would be a python extension, as Dave mentioned. (Or really,
> anything else)
>
> * I was envisioning a split of say "package", "job", "step" (terms of
> course subject to discussion). package basically a set of job, job a set
> of steps. Things like connections would be defined at the "job" level,
> along wiht parmaeters for transaction control etc. (So you can use it to
> transfer 10 different tables within a single transaction, something I
> need all the time).
>
> * I'd like to see the job format stored as XML with a well defined
> schema, so different appliations can generate it - both manually
> (GUI-wise from pgadmin and phppgadmin etc) and automatically.
>
> * The "engine" should be available both as a commandline tool (which
> must not require X libraries etc, because it should be deployable
> "everywhere") and as  acommand inside pgadmin (like MS DTS)
>
>
> Um. I think that's about it. I had some sketches of classes and
> interfaces around (not complete, but an idea), but I can't find them :(
>
>
This sounds like an awful lot of work.
A somewhat reduced version (IIRC Dave and me discussed something like
the following briefly) a more raw import (maybe into temp tables) could
be a big step, giving the admin the chance to create views on that
tables that do the extractions he likes. PostgreSQL already has all
functions you'd like, no need to reimplement them.

Regards,
Andreas


Re: Import from CSV - Questions

От
"Magnus Hagander"
Дата:
> > Umm. That would kind of assume it's written down in a
> comprehensible
> > way. Which it isn't, of course :-)
> >
> > The general ideas so far have been, off the top of my head:
> >
> > * Pluginnable set of "readers" and "writers". Originally I'd see
> > postgresql, odbc, xml and possibly csv. Pg driver would be
> optimised
> > to use COPY when available.
> >
> > * Pluginnable set of "transforms" that would operate on the
> rows. By
> > default things like copy and concatenate and maybe regexp. Future
> > enhancement would be a python extension, as Dave mentioned. (Or
> > really, anything else)
> >
> > * I was envisioning a split of say "package", "job", "step"
> (terms of
> > course subject to discussion). package basically a set of
> job, job a
> > set of steps. Things like connections would be defined at the "job"
> > level, along wiht parmaeters for transaction control etc.
> (So you can
> > use it to transfer 10 different tables within a single transaction,
> > something I need all the time).
> >
> > * I'd like to see the job format stored as XML with a well defined
> > schema, so different appliations can generate it - both manually
> > (GUI-wise from pgadmin and phppgadmin etc) and automatically.
> >
> > * The "engine" should be available both as a commandline
> tool (which
> > must not require X libraries etc, because it should be deployable
> > "everywhere") and as  acommand inside pgadmin (like MS DTS)
> >
> >
> > Um. I think that's about it. I had some sketches of classes and
> > interfaces around (not complete, but an idea), but I can't
> find them
> > :(
> >
> >
> This sounds like an awful lot of work.
> A somewhat reduced version (IIRC Dave and me discussed
> something like the following briefly) a more raw import
> (maybe into temp tables) could be a big step, giving the
> admin the chance to create views on that tables that do the
> extractions he likes. PostgreSQL already has all functions
> you'd like, no need to reimplement them.

it is. That's why I wanted to do that first, but I wanted ot have some
sort of generic framework ready first so it could be expanded on later.
I'm definitly not saying it should have all that from the beginning :-)

//Magnus