Обсуждение: Invalid byte sequence

Поиск
Список
Период
Сортировка

Invalid byte sequence

От
Yambu
Дата:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617

Re: Invalid byte sequence

От
Laurenz Albe
Дата:
On Thu, 2021-07-01 at 09:29 +0200, Yambu wrote:
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00
> CONTEXT:  COPY table1, line 377617

That means that there is an ASCII 0 byte in a string somewhere in the file.
This character is not allowed in strings in PostgreSQL.
You will have to fix the data before you can load them into PostgreSQL.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Invalid byte sequence

От
Yambu
Дата:
Thanks Laurenz, any suggestions on what tool I can use to clean the data?

On Thu, Jul 1, 2021 at 9:37 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2021-07-01 at 09:29 +0200, Yambu wrote:
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00
> CONTEXT:  COPY table1, line 377617

That means that there is an ASCII 0 byte in a string somewhere in the file.
This character is not allowed in strings in PostgreSQL.
You will have to fix the data before you can load them into PostgreSQL.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: Invalid byte sequence

От
Holger Jakobs
Дата:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Вложения

Re: Invalid byte sequence

От
Yambu
Дата:
Hi Holger

There are 8mil lines, the file is not in /  , i just removed the other parts.

On Thu, Jul 1, 2021 at 9:40 AM Holger Jakobs <holger@jakobs.com> wrote:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Re: Invalid byte sequence

От
Holger Jakobs
Дата:

Then have a look at the line 377617 and check whether its content is important. The following command shows the line on the terminal. Loading it into an interactive editor seems not to be an option due to the size of the file

sed -n '377617 p; 377617 q' /outputfile.csv
Then you can decide what to do.

Am 01.07.21 um 09:45 schrieb Yambu:
Hi Holger

There are 8mil lines, the file is not in /  , i just removed the other parts.

On Thu, Jul 1, 2021 at 9:40 AM Holger Jakobs <holger@jakobs.com> wrote:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Вложения

Re: Invalid byte sequence

От
Yambu
Дата:
There are a number of such lines, the problem is in identifying all of them and deleting them.



On Thu, Jul 1, 2021 at 9:49 AM Holger Jakobs <holger@jakobs.com> wrote:

Then have a look at the line 377617 and check whether its content is important. The following command shows the line on the terminal. Loading it into an interactive editor seems not to be an option due to the size of the file

sed -n '377617 p; 377617 q' /outputfile.csv
Then you can decide what to do.

Am 01.07.21 um 09:45 schrieb Yambu:
Hi Holger

There are 8mil lines, the file is not in /  , i just removed the other parts.

On Thu, Jul 1, 2021 at 9:40 AM Holger Jakobs <holger@jakobs.com> wrote:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Re: Invalid byte sequence

От
Holger Jakobs
Дата:

Or course you could delete all these lines, but this may lead to data loss.

Therefore, if it's not just a singe line, I think it's best to solve the problem from its root. So check how the file was generated and correct it there.

But this is beyond PostgreSQL.


Am 01.07.21 um 09:56 schrieb Yambu:
There are a number of such lines, the problem is in identifying all of them and deleting them.



On Thu, Jul 1, 2021 at 9:49 AM Holger Jakobs <holger@jakobs.com> wrote:

Then have a look at the line 377617 and check whether its content is important. The following command shows the line on the terminal. Loading it into an interactive editor seems not to be an option due to the size of the file

sed -n '377617 p; 377617 q' /outputfile.csv
Then you can decide what to do.

Am 01.07.21 um 09:45 schrieb Yambu:
Hi Holger

There are 8mil lines, the file is not in /  , i just removed the other parts.

On Thu, Jul 1, 2021 at 9:40 AM Holger Jakobs <holger@jakobs.com> wrote:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Вложения

Re: Invalid byte sequence

От
Yambu
Дата:
ok, thanks a lot

On Thu, Jul 1, 2021 at 9:58 AM Holger Jakobs <holger@jakobs.com> wrote:

Or course you could delete all these lines, but this may lead to data loss.

Therefore, if it's not just a singe line, I think it's best to solve the problem from its root. So check how the file was generated and correct it there.

But this is beyond PostgreSQL.


Am 01.07.21 um 09:56 schrieb Yambu:
There are a number of such lines, the problem is in identifying all of them and deleting them.



On Thu, Jul 1, 2021 at 9:49 AM Holger Jakobs <holger@jakobs.com> wrote:

Then have a look at the line 377617 and check whether its content is important. The following command shows the line on the terminal. Loading it into an interactive editor seems not to be an option due to the size of the file

sed -n '377617 p; 377617 q' /outputfile.csv
Then you can decide what to do.

Am 01.07.21 um 09:45 schrieb Yambu:
Hi Holger

There are 8mil lines, the file is not in /  , i just removed the other parts.

On Thu, Jul 1, 2021 at 9:40 AM Holger Jakobs <holger@jakobs.com> wrote:

How many lines has the file? Maybe line 377617 is the last one.

If so, you can just remove the last line containing this 0x00. The following command can do this for you:

sed -i '$ d' /outputfile.csv

May I remark that having a file in the / directory is a bit awkward? This directory should not be writable by any user (except root, and root doesn't fiddle with files for a database).



Am 01.07.21 um 09:29 schrieb Yambu:
Hello

How do i get past the error below. I have created a foreign table which points to a csv file and when i try to create another table it gives error below

The csv file has funny characters, how do i bypass those characters.

create foreign table table1
(
id int,
name text
)
SERVER local_file
OPTIONS (filename '/outputfile.csv' ,
format 'csv',
header 'TRUE',
delimiter ';'
);


create table2 as select * from table1;

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY table1, line 377617
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012