awk to parse multiple lines

What is the correct syntax to have the awk parse the next line as well? The next in bold is where I think it should go, but I wanted to ask the experts since I am a beginner. The file to be parsed is attached as well. Thank you :).

 awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]} {next}' OFS="\t" out_position.txt > out_parse.txt 

A 'next' there would do nothing. It tells it to go to the next line, sure, but awk was going to do that anyway -- it processes every line in turn.

Are you asking how to retrieve the next line early, so the same code can process it? Use getline for that.

The output is empty with the below command. I am trying to use the same code to parse all lines in the file (except the header). It does the first currently, but thats all. Thank you :).

 awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]} {getline}' OFS="\t" out_position.txt > out_parse.txt 

Please show the input you have and the output you want. I have no idea what you're trying to do without an example.

Wait a minute... Is this in Windows again?

The below code will parse the second line in the input from post 1, but not the third. Thank you :).

 awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" out_position.txt > out_parse.txt 
Desired Output:
13 20763642 20763642 C G
13 20763438 20763438 C G 

I see the problem now.

NR==2 only allows it to operate when NR, the record number, is exactly two. To avoid the header line, try NR>1 instead, which evaluates true for any line after 1.

The {next} still does nothing, remove it.

1 Like

Thank you :).

1 Like