awk to split file using multiple deliminators

I am trying to use awk to split a input file using multiple delimiters :-| . The input file is just one field and the output is 6 tab-delimited fields.

The awk below does run and works as expected until I add the third delimiter | , which gives the current output below. I am not sure what is wrong. Thank you :).

input

chr1:1013574-1013576|ISG15	
chr1:1013984-1014478|ISG15
chr1:1020163-1020383|AGRN
awk -F'[:-|]' '{print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' input

desired output

chr1	1013574	1013576	chr1:1013574-1013576	.	ISG15	
chr1	1013984	1014478	chr1:1013984-1014478	.	ISG15
chr1	1020163	1020383	chr1:1020163-1020383	.	AGRN

current output

	
	                .	1
			.	1
			.	1

Hello cmccabe,

Could you please try following and let me know if this helps you(should work if your Input_file is similar as shown sample).

awk -F"[:|-]" '{print $1 "\t" $2 OFS $3 OFS $1":"$2"-"$3 "\t" "." "\t" $NF}'  Input_file

Thanks,
R. Singh

1 Like

Hi,

awk -F"[|:-]" ' { print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' file

Gives the desired output:

1 Like

I added the output of both commands and seem to be having trouble with the | symbol in the input file. I tried to split on the | that and got an empty output. Thank you :).

awk -F"[:|-]" '{print $1 "\t" $2 OFS $3 OFS $1":"$2"-"$3 "\t" "." "\t" $NF}' file > output
chr1    1013574 1013576;ISG15     chr1:1013574-1013576;ISG15        .    1013576;ISG15    
chr1    1013984 1014478;ISG15 chr1:1013984-1014478;ISG15    .    1014478;ISG15
chr1    1020163 1020383;AGRN chr1:1020163-1020383;AGRN    .    1020383;AGRN
awk -F"[|:-]" ' { print $1 "\t" $2 "\t" $3 "\t" $1 ":" $2 "-" $3 "\t" "." "\t" $4}' file > output2
chr1    1013574    1013576;ISG15        chr1:1013574-1013576;ISG15        .    
chr1    1013984    1014478;ISG15    chr1:1013984-1014478;ISG15    .    
chr1    1020163    1020383;AGRN    chr1:1020163-1020383;AGRN    .    

Hello cmccabe,

Sorry didn't get you, is your output requirement is not the very first post one? As I could see your second post is having different output expected, kindly enlighten us all for this.

Thanks,
R. Singh

1 Like

The desired output in post 1 is correct for some reason I am not getting that though. Maybe the original input file is corrupt, I will check. Thank you very much :).

---------- Post updated at 09:50 AM ---------- Previous update was at 09:44 AM ----------

Both commands work great, sorry I had a space in the file that was causing an issue. Thank you :).

sed

sed -E  's/(\w+):([0-9]+)-([0-9]+)\|(.*)/\1\t\2\t\3\t\1:\2-\3\t.\t\4/' file