Обсуждение: xml import/export tools and performance

Поиск
Список
Период
Сортировка

xml import/export tools and performance

От
Ivan Sergio Borgonovo
Дата:
I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: xml import/export tools and performance

От
Brian Modra
Дата:
2009/11/6 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> I need to import/export through xml files.
>
> I was looking for tools/examples/HOWTO for postgresql.
>
> Right now I still don't know the details of the requirements.
>
> I know I won't need a GUI.
>
> I know one of the exporting parties will be a MS SQL 2005 server, so
> it would be nice if there was an easy way to import xml generated
> with the FOR XML AUTO, XMLSCHEMA ('forpg').
>
> I'd like to have a tool that can write XSD from queries
> automatically.
>
> I may need to strictly specify one of the xml output format since
> one of the importing parties pretend to support xml with something
> that is more like a tagged csv.
>
> Currently I'm mostly looking around to see where it would be
> convenient to move the boundaries of the problem.
>
> I'd be curious about what kind of performance impact they may have
> compared to COPY.
>
> thanks

I don't know of any such tools, but they may exist, if not:

Tell me the XML format you want in/out of postgres, and I can write a
tool for you. I had a quick look just now to see what the MS Access
format is, and its typically difficult to wade through the moutains of
useless information that people have posted on teh web about Microsoft
products, so i lost interest. However, if you can point me in the
right direction, so I can see the DTD or examples, or both... then I
can tell you how easy/difficult it would be to write a converter.

> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: xml import/export tools and performance

От
John R Pierce
Дата:
Ivan Sergio Borgonovo wrote:
> I know one of the exporting parties will be a MS SQL 2005 server, so
> it would be nice if there was an easy way to import xml generated
> with the FOR XML AUTO, XMLSCHEMA ('forpg').
>

Microsoft SQL Server has a pretty good data translation tool, it used to
be called DTS, but I think its called something else now... if you
install a ODBC or OLEDB Postgres driver ontp the SQL Server system, and
configure your postgres server to allow the MS SQL Server to be able to
connect to it, you can use this service to bulk transfer data  between
Postgres and MS SQL Server, either full tables or specific queries.

ahh, its called SSIS now, here's a pretty good summary of it,
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services

and some info on MS's webpile about it...
http://www.microsoft.com/sqlserver/2005/en/us/integration-services.aspx
http://technet.microsoft.com/en-us/library/cc917721.aspx

if you go this route, you don't have to mess with any sort of
import/export files, XML or otherwise, just move the data directly
between the databases over the network.


personally, I find XML is extremely inefficient.  the idea of sending
the name of every field with each data record just seems like a bad
idea.  you end up with more metadata than actual data.



Re: xml import/export tools and performance

От
Ivan Sergio Borgonovo
Дата:
On Fri, 06 Nov 2009 01:28:33 -0800
John R Pierce <pierce@hogranch.com> wrote:

> Ivan Sergio Borgonovo wrote:
> > I know one of the exporting parties will be a MS SQL 2005
> > server, so it would be nice if there was an easy way to import
> > xml generated with the FOR XML AUTO, XMLSCHEMA ('forpg').

> Microsoft SQL Server has a pretty good data translation tool, it
> used to be called DTS, but I think its called something else

SSIS... being used to DTS I think that in terms of usability it made
a step back... but still XML support in MS SQL looks reasonably neat.

> and some info on MS's webpile about it...
> http://www.microsoft.com/sqlserver/2005/en/us/integration-services.aspx
> http://technet.microsoft.com/en-us/library/cc917721.aspx

> if you go this route, you don't have to mess with any sort of
> import/export files, XML or otherwise, just move the data directly
> between the databases over the network.

Unfortunately that's not the only source of input/output and
internally the DB schema may be a bit different and towards some
destinations we just have one choice.


> personally, I find XML is extremely inefficient.  the idea of
> sending the name of every field with each data record just seems
> like a bad idea.  you end up with more metadata than actual data.

It is.

We need at least one xml output, we may have some xml input and some
xml output. On some parties we don't have constraints and we just
thought that xml could be a common exchange format. Knowing nothing
about the tools available for pg, some example of real usage etc...
I was looking around to see where to start to cut the cake.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: xml import/export tools and performance

От
Ries van Twisk
Дата:

On Nov 6, 2009, at 2:52 AM, Ivan Sergio Borgonovo wrote:

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks


Ivan,

I have been using JasperETL for this purpose. But like what others say,
XML is very in efficient for this task.

Currently I am in the process if importing XML feeds into PostgreSQL.
At first I used the ETL tool to write to the correct table by creating joins
into the ETL tool.

However, this slows things down quite a bit (but the process is VERY manageable).
Currently I keep teh route within JasperETL as short as possible in teh following steps

- load XML
- remove columns I don't need
- cleanup some data using a javarow 
- load into a staging table

From there I execute a stored procedure to normalize the data (create additional records where needed in
other foreign tables) and push data into production. Triggers on my production table
do some additional work for tsearch2 and some other small stuff.

In my case, on a 32Bit system loading a 200Mb XML file consumes 1Gb real memory.
On 64Bit systems with Java 1.6 you can push this further down the road though.


if you can export to CSV and import using copy into PG, I would go for that route
if you can control the format of the CSV well and reliable, because detecting errors
in a CSV is much harder then in XML (missing column in CSV can mess up data)

Ries