Обсуждение: how to continue after error in batch mode with psql
I have a nightly process that pulls data over into postgresql 8.3.10 from a progress server runing under8.x. unfortunately the progress db does not enforce record size/length limitations. It does not care if a record of 20 characters contains 21, or 100 for that matter. we have a simple pipe that taks the progress data export dump and imports it into postgresql su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS '\\\"'\"" If any of the fields are over-length, we do get a error message that tells us which row needs fixing. ERROR: value too long for type character varying(20) CONTEXT: COPY t_test, line 2, column t_comment_c: "'comment 3 that is too long'" But how do we get psql to run through and continue after an error and import as much as possible of the rest of the import data and give us error messages about all lines with errors? So I want: ERROR: value too long for type character varying(20) CONTEXT: COPY t_test, line 2, column t_comment_c: "'comment 3 that is too long'" ERROR: value too long for type character varying(20) CONTEXT: COPY t_test, line 4, column t_comment_c: "'comment 5 that is too long'" ERROR: value too long for type character varying(20) CONTEXT: COPY t_test, line 6, column t_comment_c: "'comment 7 that is too long'" ERROR: value too long for type character varying(20) CONTEXT: COPY t_test, line 8, column t_comment_c: "'comment 9 that is too long'" I have tried adding the following to psql, but that did not make any difference. "-v ON_ERROR_STOP" "-v ON_ERROR_STOP=0" "--set "ON_ERROR_STOP" "--set "ON_ERROR_STOP=0" Here is my test database definition, followed by my test.csv data: SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SET check_function_bodies = false; SET client_min_messages = warning; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: t_test; Type: TABLE; Schema: public; Owner: postgres; Tablespace: -- CREATE TABLE t_test ( t_record_i integer DEFAULT 0 NOT NULL, t_comment_c character varying(20) ); ALTER TABLE public.t_test OWNER TO postgres; -- -- Data for Name: t_test; Type: TABLE DATA; Schema: public; Owner: postgres -- INSERT INTO t_test VALUES (1, 'comment 1'); -- -- Name: t_table_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace: -- ALTER TABLE ONLY t_test ADD CONSTRAINT t_table_pkey PRIMARY KEY (t_record_i); -- -- Name: t_test; Type: ACL; Schema: public; Owner: postgres -- REVOKE ALL ON TABLE t_test FROM PUBLIC; REVOKE ALL ON TABLE t_test FROM postgres; GRANT ALL ON TABLE t_test TO postgres; and my test.csv data: 2,'comment 2' 3,'comment 3 that is too long' 4,'comment 4' 5,'comment 5 that is too long' 6,'comment 6' 7,'comment 7 that is too long' 8,'comment 8' 9,'comment 9 that is too long' 10,'comment 10' What I am hoping for is to get an error message about the rows 3,5,7 and 9 (and - if I am lucky all the data up to the defined length in the record) and definitely all the data for row 2,4,6,8 and 10 in the t_test table in the test database. What does it take, top get this working? I am a newbie, am I barking up the wrong tree with trying to unset ON_ERROR_STOP? Many thanks for any hints. Regards, -- Urs Rau
On 2010-04-28, Urs Rau <urs.rau@uk.om.org> wrote: > I have a nightly process that pulls data over into postgresql 8.3.10 > from a progress server runing under8.x. unfortunately the progress db > does not enforce record size/length limitations. It does not care if a > record of 20 characters contains 21, or 100 for that matter. > > we have a simple pipe that taks the progress data export dump and > imports it into postgresql > > su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM > stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS > '\\\"'\"" > > If any of the fields are over-length, we do get a error message that > tells us which row needs fixing. > > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 2, column t_comment_c: "'comment 3 that is > too long'" > > But how do we get psql to run through and continue after an error and > import as much as possible of the rest of the import data and give us > error messages about all lines with errors? why not just copy the whole lot into a temp table (like the target table but with text fields instead of fixed-width varchar) and then do a select on that to find the over-length lines and another select to insert the good data into the real target table.
hi Urs, this is perhaps not what you meant but... I'd use a simple script to filter out the offending lines before they go into the database, like the following written in Tcl (but you could write this in almost any language): ----------<snip>---------- #!/usr/bin/tclsh set fp [open badlines.txt a] while {0 <= [gets stdin line]} { set n [string length $line] if {$n < 1 || $n > 20} { puts $fp $line } else { puts stdout $line } } close $fp exit 0 ----------<snip>---------- now your pipeline becomes: su - postgres -c "cat test.csv | badlines | psql -X -q test -c \"COPY t_test FROM stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS '\\\"'\"" and you can process the lines in 'badlines.txt' later, at your leisure. HTH -- regards, jr. (jr@tailorware.org.uk)
Hi Jasen, Jasen Betts wrote: > > why not just copy the whole lot into a temp table (like the target > table but with text fields instead of fixed-width varchar) > and then do a select on that to find the over-length lines and another > select to insert the good data into the real target table. > Thanks, we did that. In fact the postgresql db schema is generated from a xml file, so we changed fields where this applies to now be text and made them bigger in postgresql schema. Now the nightly mirroring runs through. without any field 'over-runs' stopping it. Regards, -- Urs Rau
Hi jr jr wrote: > hi Urs, > > this is perhaps not what you meant but... > Actually, this in-line-filter idea is exactly what I had in mind as the worst-case laborious work around when I wrote my email. Laborious? Laborious because there are 350+ tables that we fetch each night and I did not fancy writing and parsing 350+ tables on each run. The idea of a temp table was the sort of hint I hoped to get off the list. And I got it. ;-) Now having run this for a few days, we start to realise that maybe we still have to run it through an in-line-filter as we keep getting new errors or overruns in new places. So we might have to generate 350+ in-line-filters and run them each night. BTW, there is a really neat trick that I stumbled across recently that we plan to use for logging the actions of the filter. It's a technique that allows us to have the stdout and stderr of the filter in a log file, but also get the stderr out separately, so 'cron', which runs the filter nightly, can email us the errors or failures. (((filter-progress-db.sh | tee -a /var/log/progress-db-mirror) \ 3>&1 1>&2 2>&3 | tee -a /var/log/progress-db-mirror) \ 3>&1 1>&2 2>&3) >/dev/null When the above command is run from cron, it will log everything to the file and if anything goes wrong, stderr will go to the console, which gets e-mailed to an administrator. If desired, you could also log stdout and stderr to two distinct files. It's taken from http://www.enterprisenetworkingplanet.com/linux_unix/article.php/3870976/Improve-Your-Unix-Logging-with-Advanced-IO-Redirection.htm > I'd use a simple script to filter out the offending lines before they go > into the database, like the following written in Tcl (but you could > write this in almost any language): > > ----------<snip>---------- > #!/usr/bin/tclsh I guess in my case this would have to be perl. ;-) > HTH Thanks, yes it did. I like this list. Regards, -- Urs Rau
Urs, psql is not a programming or reporting tool. Use perl or python instead. Urs Rau wrote: > I have a nightly process that pulls data over into postgresql 8.3.10 > from a progress server runing under8.x. unfortunately the progress db > does not enforce record size/length limitations. It does not care if a > record of 20 characters contains 21, or 100 for that matter. > > we have a simple pipe that taks the progress data export dump and > imports it into postgresql > > su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM > stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS > '\\\"'\"" > > If any of the fields are over-length, we do get a error message that > tells us which row needs fixing. > > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 2, column t_comment_c: "'comment 3 that is > too long'" > > But how do we get psql to run through and continue after an error and > import as much as possible of the rest of the import data and give us > error messages about all lines with errors? > > So I want: > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 2, column t_comment_c: "'comment 3 that is > too long'" > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 4, column t_comment_c: "'comment 5 that is > too long'" > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 6, column t_comment_c: "'comment 7 that is > too long'" > ERROR: value too long for type character varying(20) > CONTEXT: COPY t_test, line 8, column t_comment_c: "'comment 9 that is > too long'" > > > I have tried adding the following to psql, but that did not make any > difference. > > "-v ON_ERROR_STOP" > "-v ON_ERROR_STOP=0" > "--set "ON_ERROR_STOP" > "--set "ON_ERROR_STOP=0" > > > Here is my test database definition, followed by my test.csv data: > > SET client_encoding = 'UTF8'; > SET standard_conforming_strings = on; > SET check_function_bodies = false; > SET client_min_messages = warning; > SET search_path = public, pg_catalog; > SET default_tablespace = ''; > SET default_with_oids = false; > -- > -- Name: t_test; Type: TABLE; Schema: public; Owner: postgres; Tablespace: > -- > CREATE TABLE t_test ( > t_record_i integer DEFAULT 0 NOT NULL, > t_comment_c character varying(20) > ); > ALTER TABLE public.t_test OWNER TO postgres; > -- > -- Data for Name: t_test; Type: TABLE DATA; Schema: public; Owner: postgres > -- > INSERT INTO t_test VALUES (1, 'comment 1'); > -- > -- Name: t_table_pkey; Type: CONSTRAINT; Schema: public; Owner: > postgres; Tablespace: > -- > ALTER TABLE ONLY t_test > ADD CONSTRAINT t_table_pkey PRIMARY KEY (t_record_i); > -- > -- Name: t_test; Type: ACL; Schema: public; Owner: postgres > -- > REVOKE ALL ON TABLE t_test FROM PUBLIC; > REVOKE ALL ON TABLE t_test FROM postgres; > GRANT ALL ON TABLE t_test TO postgres; > > and my test.csv data: > 2,'comment 2' > 3,'comment 3 that is too long' > 4,'comment 4' > 5,'comment 5 that is too long' > 6,'comment 6' > 7,'comment 7 that is too long' > 8,'comment 8' > 9,'comment 9 that is too long' > 10,'comment 10' > > > What I am hoping for is to get an error message about the rows 3,5,7 and > 9 (and - if I am lucky all the data up to the defined length in the > record) and definitely all the data for row 2,4,6,8 and 10 in the t_test > table in the test database. > > What does it take, top get this working? I am a newbie, am I barking up > the wrong tree with trying to unset ON_ERROR_STOP? > > Many thanks for any hints. > > Regards, > > > -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com