On 5/23/05, ogjunk-pgjedan@yahoo.com <ogjunk-pgjedan@yahoo.com> wrote:
> I am restructuring my DB schema and need help migrating data from 1
> column of an existing table to two new tables. I have some Java code
> that can do this for me, but it's very slow, and I am now hoping I can
> migrate this data with some clever SQL instead.
>
> Here are my 3 tables:
>
> user_data (existing, old table)
> ---------
> id (PK),
> user_id (FK)
> keywords VARCHAR(256)
> -- this contains comma separated keywords
> -- e.g. "new york,san francisco, dallas, food"
> -- also "keywords without strings are really just 1 keyword"
> add_date TIMESTAMP
>
>
> So now I'm trying to migrate this "keywords" VARCHAR column to a more
> normalized schema:
>
> user_data_keyword (new lookup table to populate)
> -----------------
> id (PK) -- I may change PK to PK(user_data_id, keyword_id)
> user_data_id (FK)
> keyword_id (FK)
>
>
> keyword (new table to populate)
> -------
> id (PK)
> name VARCHAR(64) NOT NULL UNIQUE
> add_date TIMEZONE
>
>
> I just found
> http://www.postgresql.org/docs/current/static/functions-string.html ,
> but if anyone could lend me a hand by getting me started with writing a
> function for this, I'd really appreciate it.
>
> Thanks,
> Otis
>
I am not aware of any extremely clever SQL to make this ALL happen.
However you can do everything with pl/pgsql. My contribution below
will build the keyword table for you. Once you have this working and
you understand it, you could extend it to build your user_data_keywordtable.
CREATE OR REPLACE FUNCTION split_on_commas(TEXT) RETURNS SETOF TEXT
LANGUAGE 'plpgsql' AS '
DECLARE mystring ALIAS FOR $1; incomma BOOLEAN := FALSE; -- ## Catch the first word endpos INTEGER; --
endcharacter startpos INTEGER := 0; pos INTEGER; outstring TEXT;
BEGIN SELECT INTO endpos CHAR_LENGTH(mystring); FOR pos IN 1 .. endpos LOOP -- There are 2 single quotes, a
space,and 2 single quotes below: IF SUBSTRING(mystring,pos,1) = '','' THEN incomma := TRUE;
outstring:= SUBSTRING(mystring,startpos,pos-startpos); RETURN NEXT outstring; ELSE IF incomma IS
TRUETHEN incomma := FALSE; startpos := pos; END IF; END IF; END LOOP; IF
incommaIS FALSE THEN -- ## Catch the last phrase outstring := SUBSTRING(mystring,startpos,endpos); RETURN
NEXToutstring; END IF;
RETURN;
END;
';
-- try it! You will get a set of keyword all split out
SELECT * FROM split_on_commas('foo1,bar2,foo3');
From here, we need a migration function that will loop through row(s)
in your table. I also noticed that some of your words may have leading
and trailing spaces. The function drop_first_and_last_space should
take care of that. Notice that I use this function below in the
build_keyword_table function.
CREATE OR REPLACE FUNCTION drop_first_and_last_space(TEXT) RETURNS
TEXT LANGUAGE 'plpgsql' AS '
DECLARE in_string ALIAS FOR $1; out_msg TEXT;
BEGIN out_msg := in_string; IF SUBSTRING(out_msg,1,1) = '' '' THEN RAISE NOTICE ''dropping leading space'';
out_msg := substring(out_msg,2,length(out_msg)); END IF; --and the last space IF
SUBSTRING(out_msg,length(out_msg),1)= '' '' THEN RAISE NOTICE ''dropping trailing space''; out_msg :=
substring(out_msg,1,length(out_msg)-1); END IF;
RETURN out_msg;
END
';
-- This function uses the INSERT ((SELECT)EXCEPT(SELECT)); syntax
mentioned in http://www.varlena.com/varlena/GeneralBits/19.htm. Note
that I am unsure why I didn't need to use FOR r in EXECUTE ''SELECT...
syntax. But this did work in my small scale test.
CREATE OR REPLACE FUNCTION build_keyword_table(TEXT) RETURNS TEXT
LANGUAGE 'plpgsql' AS '
DECLARE match ALIAS FOR $1; rec RECORD; r RECORD; out TEXT := ''done'';
BEGIN FOR rec IN SELECT keywords FROM user_data WHERE id SIMILAR TO match ORDER BY id LOOP RAISE
NOTICE'' working on %'',rec.keywords; FOR r IN SELECT drop_first_and_last_space(split_on_commas) AS
kw FROM split_on_commas(rec.keywords) LOOP RAISE NOTICE ''trying to insert - %'',r.kw; INSERT
INTOkeyword (name) ( (SELECT r.kw) EXCEPT (SELECT
name FROM keyword WHERE name=r.kw)); END LOOP; END LOOP;
RETURN out;
END;
';
Then try it all together, like so:
SELECT build_keyword_table('%');
The input goes to a SIMILAR TO on id - and % means all elements. You
can match a single row by using something like SELECT
build_keyword_table('123');
I also used this as my test data... It worked for me!
CREATE TABLE user_data (id SERIAL,user_id INTEGER,keywords VARCHAR(256) NOT NULL,add_date
TIMESTAMP,PRIMARYKEY(id)
);
INSERT INTO user_data (keywords) VALUES ('new york,san francisco,
dallas, food');
INSERT INTO user_data (keywords) VALUES ('phoenix, hot, summer, fun');
CREATE TABLE keyword (name VARCHAR(64) NOT NULL,id SERIAL,add_date TIMESTAMP,PRIMARY KEY(name)
);
-- todo put a UNIQUE INDEX on keyword (id)
I hope this helps.
Tony Wasson