Обсуждение: COPY FROM wish list

Поиск
Список
Период
Сортировка

COPY FROM wish list

От
"Marc Mamin"
Дата:

Hello,

Looking at the TODO List, I feel that only some aspects of the COPY FROM command are adressed.
Could a discussion trigger some activity on this topic  :o)  ?

Best regards,

Marc Mamin

Here my wish list:

COPY tablename [ ( column [, ...] ) ]
    FROM { 'filename' | STDIN }
    [ [ WITH ]
   
       [ CHECKONLY (DATA_TYPES,CONSTRAINTS) ] # do not import, just check the content
   
       [ SKIPLINES (n) ]
   
       [ HEADERLINE (n) ]    # line conlaining the column names, must be within the line to skip.
                             # must match the column list when both are given
       [ DEFERCONSTRAINTS ]
   
       [ SKIPINVALID [FILE file ] SKIPLIMIT (n)]      # when set, invalid lines are skipped and possibly stored in file
                                                      # an exception is triggered when more than SKIPLIMIT lines are found

      
             
       [ NULLREPLACEMENT (column =value[, ...] ) ]  # allow a per column handling of null values.
                                                    # cannot be set allong with NULL [ AS ]
                                                    # not usable for columns that are missing in the input file  ?
                          
       [ FEEDBACK  (n) ]     # display progress every n rows .# for ORACLE users...
   
       [ COMMIT (n) ]        # just an idea: commit each n rows....
                          
       [
          [ BINARY ]
          [ OIDS ]
          [ DELIMITER [ AS ] 'delimiter' ]
          [ NULL [ AS ] 'null string' ]
          [ CSV [ HEADER ]
                [ QUOTE [ AS ] 'quote' ]
                [ ESCAPE [ AS ] 'escape' ]
                [ FORCE NOT NULL column [, ...] ]
   ]

Re: COPY FROM wish list

От
Greg Smith
Дата:
Marc Mamin wrote:
>
> Looking at the TODO List, I feel that only some aspects of the COPY
> FROM command are adressed.
> Could a discussion trigger some activity on this topic  :o)  ?
>

(Sounds only of crickets chirping)...guess not.  I would love to have
FEEDBACK added.

The TODO list doesn't have every COPY extension idea around on it
though.  http://wiki.postgresql.org/wiki/COPY for example is an implicit
wishlist:  "fix all these things documented as troublesome".  Your
suggestions might make a good addition to that, perhaps in a section
specifically addressing "Missing Features Common to Other Database Loaders".

If you don't mind a speed loss in the process, I've found pgloader to be
a nicer interface for dealing with slightly odd data imports that don't
match the built-in COPY restrictions, it does some of the things you're
looking for:  http://pgfoundry.org/projects/pgloader/

And pg_bulkload aims to handle some of the high-performance features:
http://pgbulkload.projects.postgresql.org/

The problem with working on the COPY code, from the perspective of
finding people to pay for the job, is that bulk-loading is a one-time
operation for many people.  Easier to just suck it up and write a set of
one-off data massage tools than to try and fix the core to add these
capabilties.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: COPY FROM wish list

От
Scott Bailey
Дата:
Marc Mamin wrote:
> Hello,
>
> Looking at the TODO List, I feel that only some aspects of the COPY FROM
> command are adressed.
> Could a discussion trigger some activity on this topic  :o)  ?
>
> Best regards,
>
> Marc Mamin
>
>
> Here my wish list:
>
> COPY tablename [ ( column [, ...] ) ]
>     FROM { 'filename' | STDIN }
>     [ [ WITH ]
>
>        [ CHECKONLY (DATA_TYPES,CONSTRAINTS) ] # do not import, just
> check the content
>
>        [ SKIPLINES (n) ]
>
>        [ HEADERLINE (n) ]    # line conlaining the column names, must be
> within the line to skip.
>                              # must match the column list when both are
> given
>        [ DEFERCONSTRAINTS ]
>
>        [ SKIPINVALID [FILE file ] SKIPLIMIT (n)]      # when set,
> invalid lines are skipped and possibly stored in file
>                                                       # an exception is
> triggered when more than SKIPLIMIT lines are found
>
>
>
>        [ NULLREPLACEMENT (column =value[, ...] ) ]  # allow a per column
> handling of null values.
>                                                     # cannot be set
> allong with NULL [ AS ]
>                                                     # not usable for
> columns that are missing in the input file  ?
>
>        [ FEEDBACK  (n) ]     # display progress every n rows .# for
> ORACLE users...
>
>        [ COMMIT (n) ]        # just an idea: commit each n rows....
>
>        [
>           [ BINARY ]
>           [ OIDS ]
>           [ DELIMITER [ AS ] 'delimiter' ]
>           [ NULL [ AS ] 'null string' ]
>           [ CSV [ HEADER ]
>                 [ QUOTE [ AS ] 'quote' ]
>                 [ ESCAPE [ AS ] 'escape' ]
>                 [ FORCE NOT NULL column [, ...] ]
>    ]
>

Looks like you are requesting sql*loader. My opinion is that we should
keep COPY simple, uncluttered and fast. And instead have a preprocessor
that can do all of the transforms, skipping, checking and logging.

PgFoundry has http://pgfoundry.org/projects/pgloader/
It is a step in the right direction but definitely not as powerful as
sql*loader.

I've been writing a Postgres equivalent that does Oracle SQL*Loader/DB2
Load style import, and external tables using CSV, fixed with and XML
data sources. But its not production ready. I'm hoping SQL/MED makes it
in to the next release of Postgres so I can throw it all away :)

Scott

Re: COPY FROM wish list

От
Dimitri Fontaine
Дата:
Scott Bailey <artacus@comcast.net> writes:
> PgFoundry has http://pgfoundry.org/projects/pgloader/
> It is a step in the right direction but definitely not as powerful as
> sql*loader.

Yeah, it's only offering what I needed and what I've been requested to
add. So far there's support for INFORMIX UNLOAD files, mysqldump -t
files, fixed with files, ragged files, and some more.

And it also support python threading for some parallel workload, either
loading several files at once or several chunks of the same file, and
then 2 modes are possible.

If you need more, try asking, you never know. I'm still on the hook to
maintaining it, though I've not received any bug report in a long
while. I guess it's not much used anymore.

I've been proposed to replace the pgloader.conf INI file with a custom
COPY command parser exposing all the options, and will consider that
sometime in the future.

> I've been writing a Postgres equivalent that does Oracle SQL*Loader/DB2 Load
> style import, and external tables using CSV, fixed with and XML data
> sources. But its not production ready. I'm hoping SQL/MED makes it in to the
> next release of Postgres so I can throw it all away :)

XML support in pgloader will certainly take the form of applying user
given XSLT filter that outputs CSV from the XML. That's the option
requiring the less code in pgloader I guess :)

I'd be happy to see pgloader deprecated by PostgreSQL offering its
features.

Meantime, do you want to join the fun on the pgloader front?
--
dim