regex - Write partly tab-delimited data to MySQL database -


i have mysql-database 7 columns (chr, pos, num, ia, ib, ic, id) , file contains 40 million lines each containing dataset. each line has 4 tab delimited columns, whereas first 3 columns contain data, , fourth column can contain 3 different key=value pairs separated semicolon

chr   pos   num   info 1     10203 3     ia=0.34;ib=nerv;ic=45;id=dskf12586 1     10203 4     ia=0.44;ic=45;id=dsf12586;ib=nerv 1     10203 5      1     10213 1     ib=nerv;ic=49;ia=0.14;id=dskf12586 1     10213 2     ia=0.34;ib=nerv;id=cap1486 1     10225 1     id=dscf12586 

the key=value pairs in column info have no specific order. i'm not sure if key can occur twice (i hope not).

i'd write data database. first 3 columns no problem, extractiong values info-columns puzzles me, since key=value pairs unordered , not every key has in line. similar dataset (with ordered info-column) used java-programm in connection regular expressions, allowed me (1) check , (2) extract data, i'm stranded.

how can resolve task, preferably bash-script or directly in mysql?

you did not mention how want write data. below example awk shows how can each individual id , key in each line. instead of printf, can use own logic write data

[[bash_prompt$]]$ cat test.sh; echo "###########"; awk -f test.sh log {   if(length($4)) {     split($4,array,";");     print "in " $1, $2, $3;     for(element in array) {       key=substr(array[element],0,index(array[element],"="));       value=substr(array[element],index(array[element],"=")+1);       printf("found %s key , %s value %d line %s\n",key,value,nr,array[element]);     }   } } ########### in 1 10203 3 found id= key , dskf12586 value 1 line id=dskf12586 found ia= key , 0.34 value 1 line ia=0.34 found ib= key , nerv value 1 line ib=nerv found ic= key , 45 value 1 line ic=45 in 1 10203 4 found ib= key , nerv value 2 line ib=nerv found ia= key , 0.44 value 2 line ia=0.44 found ic= key , 45 value 2 line ic=45 found id= key , dsf12586 value 2 line id=dsf12586 in 1 10213 1 found id= key , dskf12586 value 4 line id=dskf12586 found ib= key , nerv value 4 line ib=nerv found ic= key , 49 value 4 line ic=49 found ia= key , 0.14 value 4 line ia=0.14 in 1 10213 2 found ia= key , 0.34 value 5 line ia=0.34 found ib= key , nerv value 5 line ib=nerv found id= key , cap1486 value 5 line id=cap1486 in 1 10225 1 found id= key , dscf12586 value 6 line id=dscf12586 

Comments

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -