cmccabe
October 25, 2014, 11:44am
1
Trying to parse column C ($3) of the attached file (104 rows). The data is in the below format all in a string. Each string would be a separate row with the data in column A ($1) and column B ($2) being the header. All the data is in seperate columns as well. Thank you :).
ACTA 59 A_16_P32713632=chr10:90695750-90695810, A_16_P32713635=chr10:90696573-90696633, A_16_P32713680=chr10:90697419-90697479
ADAMTS10 7 A_16_P41135847=chr19:8647429-8647489, A_16_P03421012=chr19:8659282-8659342
Desired Output:
ACTA2 59
A_16_P32713632 chr10 90695750 90695810
A_16_P32713635 chr10 90696573 90696633
A_16_P32713680 chr10 90697419 90697479
ADAMTS10 7
A_16_P41135847 chr19 8647429 8647489
A_16_P03421012 chr19 8659282 8659342,
awk '{print $1,$2;for(i=3;i<=NF;i++){ gsub(/[=:-]/,OFS,$i); sub(/,/,"",$i); print $i }}' OFS='\t' file
1 Like
RudiC
October 25, 2014, 12:07pm
3
Like this:
awk ' {print $1, $2
for (i=3; i<=NF; i++)
{n=split ($i, T, "[=:-,]")
print T[1],T[2],T[3],T[4]
}
}
' OFS="\t" file
ACTA 59
A_16_P32713632 chr10 90695750 90695810
A_16_P32713635 chr10 90696573 90696633
A_16_P32713680 chr10 90697419 90697479
ADAMTS10 7
A_16_P41135847 chr19 8647429 8647489
A_16_P03421012 chr19 8659282 8659342
1 Like
cmccabe
October 25, 2014, 12:11pm
4
Thank you :)., works perfect!
Or something like this
awk '{match($0,/^[^ ]* [^ ]* /);s=substr($0,RLENGTH+1); gsub(/[=:-]/,OFS,s); gsub(/, /,RS,s); $0 = $1 OFS $2 RS s}1' file
cmccabe
October 25, 2014, 12:33pm
6
Can the code be modified to output this:
same data just no header.
Desired Output:
A_16_P32713632 chr10 90695750 90695810
A_16_P32713635 chr10 90696573 90696633
A_16_P32713680 chr10 90697419 90697479
A_16_P41135847 chr19 8647429 8647489
A_16_P03421012 chr19 8659282 8659342
Thank you :).
remove print $1,$2;
awk '{for(i=3;i<=NF;i++){ gsub(/[=:-]/,OFS,$i); sub(/,/,"",$i); print $i }}' OFS='\t' file
How is the order of the output determined in the script?
Original dat;
ACTA 59 A_16_P32713632=chr10:90695750-90695810, A_16_P32713635=chr10:90696573-90696633, A_16_P32713680=chr10:90697419-90697479
For example, if instead of:
A_16_P32713632 chr10 90695750 90695810
A_16_P32713635 chr10 90696573 90696633
A_16_P32713680 chr10 90697419 90697479
a different out is needed:
chr10 90695750 90695810 A_16_P32713632
chr10 90696573 90696633 A_16_P32713635
chr10 90697419 90697479 A_16_P32713680
Same data just different order.
Thank you :).
cmccabe:
How is the order of the output determined in the script?
Original dat;
ACTA 59 A_16_P32713632=chr10:90695750-90695810, A_16_P32713635=chr10:90696573-90696633, A_16_P32713680=chr10:90697419-90697479
For example, if instead of:
A_16_P32713632 chr10 90695750 90695810
A_16_P32713635 chr10 90696573 90696633
A_16_P32713680 chr10 90697419 90697479
a different out is needed:
chr10 90695750 90695810 A_16_P32713632
chr10 90696573 90696633 A_16_P32713635
chr10 90697419 90697479 A_16_P32713680
Same data just different order.
Thank you :).
Read RudiC's answer and change array index Parse and reformat Post: 302922478
print T[1],T[2],T[3],T[4]
to print T[2],T[3],T[4],T[1]
cmccabe
October 25, 2014, 1:47pm
10
awk ' {for (i=3; i<=NF; i++)
{n=split ($i, T, "[=:-,]")
print T[4],T[1],T[2],T[3]
}
}
' OFS="\t" header_sort.txt > sort.txt
like this
Sort.txt (no headers)
chr10 90695750 90695810 A_16_P32713632
chr10 90696573 90696633 A_16_P32713635
Thanks :).