importing a messy text file

Поиск
Список
Период
Сортировка
От Willy-Bas Loos
Тема importing a messy text file
Дата
Msg-id CAHnozTiOp45ur4=6kqUuPBadptFjM=hDgGa1fbZUmF1Z-UZijA@mail.gmail.com
обсуждение исходный текст
Ответы Re: importing a messy text file  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Re: importing a messy text file  (Alberto Cabello Sánchez <alberto@unex.es>)
Re: importing a messy text file  (bricklen <bricklen@gmail.com>)
Список pgsql-general
Hi,

I have a 56GB textfile that i want to import into postgres.
The file is tab delimited and not quoted.
I deleted the header with the column names (using sed) so that i could use COPY with the non-csv text type (because some of the text values contain quotes).

I had some minor trouble with the file which i managed, but now i have one where i can't think of a solution, even though it seems so simple.

The problem is this:
There is a tab after the last column, in many but not all records.
When i ran into the extra tab i added a dummy column in the destination table but now COPY thows an error because the data for the dummy column is missing (on record ~275K of about 150M).

The file is too big to edit by hand and anyway it would probably not be feasible to manually add tabs for every record that misses one, although i don't know how many it would be.

I realize that there could be other showstoppers in the file, like missing tabs in the middle or extra tabs in the middle, but i would like to try and get this fixed.

Maybe it would be feasible to add every record as 1 value and then splitting those into columns using postgres text processing.
Or maybe there is an (undocumented?) option in copy or \copy to ignore extra columns.
Or maybe there is some no-sql software where i can import this and then structure the data before i pass it to postgres..

Do you have any tips, please?

Cheers,


--
Willy-Bas Loos

В списке pgsql-general по дате отправления:

Предыдущее
От: Hello World
Дата:
Сообщение: Re: Security Issues: Allowing Clients to Execute SQL in the Backend.
Следующее
От: Rory Campbell-Lange
Дата:
Сообщение: Re: Security Issues: Allowing Clients to Execute SQL in the Backend.