How to split awk field correctly -


i have file (test.bed) looks (which might not tab-seperated):

chr1    10002   10116   id=1;frame=0;strand=+;  0   + chr1    10116   10122   id=2;frame=0;strand=+;  0   + chr1    10122   10128   id=3;frame=0;strand=+;  0   + chr1    10128   10134   id=4;frame=0;strand=+;  0   + chr1    10134   10140   id=5;frame=0;strand=+;  0   + chr1    10140   10146   id=6;frame=0;strand=+;  0   + chr1    10146   10182   id=7;frame=0;strand=+;  0   + chr1    10182   10188   id=8;frame=0;strand=+;  0   + chr1    10188   10194   id=9;frame=0;strand=+;  0   + chr1    10194   10200   id=10;frame=0;strand=+; 0   + 

i want produce following output (which should tab-seperated):

chr1    10002   10116   id=1    0   + chr1    10116   10122   id=2    0   + chr1    10122   10128   id=3    0   + chr1    10128   10134   id=4    0   + chr1    10134   10140   id=5    0   + chr1    10140   10146   id=6    0   + chr1    10146   10182   id=7    0   + chr1    10182   10188   id=8    0   + chr1    10188   10194   id=9    0   + chr1    10194   10200   id=10   0   + 

i have tried following code:

awk 'ofs="\t" split ($0, a, ";"){print a[1],$5,$6}' test.bed  

but get:

chr1    10002   10116   id=1    40  4+ chr1    10116   10122   id=2    40  4+ chr1    10122   10128   id=3    40  4+ chr1    10128   10134   id=4    40  4+ chr1    10134   10140   id=5    40  4+ chr1    10140   10146   id=6    40  4+ chr1    10146   10182   id=7    40  4+ chr1    10182   10188   id=8    40  4+ chr1    10188   10194   id=9    40  4+ chr1    10194   10200   id=10   40  4+ 

what doing wrong? somehow number '4' added last 2 fields. thought number '4' somehow might have splitting in 4th field, however, tried producing similar file 3rd field split, , still got number '4' added last 2 fields. rather new 'awk' guess error in syntax. appreciated.

if set field separator whitespace or semi-columns won't have handle splitting yourself:

$ awk '{print $1,$2,$3,$4,$8,$9}' fs='[[:space:]]+|;' ofs='\t' file chr1    10002   10116   id=1    0   + chr1    10116   10122   id=2    0   + chr1    10122   10128   id=3    0   + chr1    10128   10134   id=4    0   + chr1    10134   10140   id=5    0   + chr1    10140   10146   id=6    0   + chr1    10146   10182   id=7    0   + chr1    10182   10188   id=8    0   + chr1    10188   10194   id=9    0   + chr1    10194   10200   id=10   0   + 

as doing wrong in:

awk 'ofs="\t" split ($0, a, ";"){print a[1],$5,$6}' 
  • the syntax of awk condition{block} , setting value of ofs , splitting not conditional. statements should inside block.
  • however don't need set value of ofs on every line should initialized once. can using -v option, in begin block or after script.

valid alternatives:

$ awk -v ofs='\t' '{split($0,a,";");print a[1],$5,$6}' file  $ awk 'begin{ofs="\t"}{split($0,a,";");print a[1],$5,$6}' file  $ awk '{split ($0,a,";");print a[1],$5,$6}' ofs='\t' file 

Comments

Popular posts from this blog

php - mySql Join with 4 tables -

css - Text drops down with smaller window -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -