fat
1
I am trying to sort, do uniq by 1st column and report this 4 columns tab delimiter table , eg
chr10:112174128 rs2255141 2E-10 Cholesterol, total
chr10:112174128 rs2255141 7E-16 LDL
chr10:17218291 rs10904908 3E-11 HDL Cholesterol
chr10:17218291 rs970548 8E-9 TG
chr1:109275684 rs629301 2E-170 Cholesterol, totat
I want an output this this
chr10:112174128 rs2255141 2E-10/7E-16 Cholesterol, total; LDL
chr10:17218291 rs10904908/rs970548 3E-11/8E-9 HDL Cholesterol; TG
chr1:109275684 rs629301 2E-170 Cholesterol, total
Can you help ?
RudiC
2
Any attempts from your side?
fat
3
Yes,
awk 'BEGIN{printf "\t"}{A[$1] = A[$1] ? A[$1] OFS $4 : $4}FNR==1{printf FILENAME OFS}END{printf RS; for(i in A)print i, A}' OFS='\t' myfile.txt | sort
RudiC
4
Try
awk ' {F2[$1]=F2[$1]"/"$2
F3[$1]=F3[$1]"/"$3
F4[$1]=F4[$1]";"$4}
END {for (S1 in F2)
print S1,
substr(F2[S1],2),
substr(F3[S1],2),
substr(F4[S1],2)}
' FS="\t" OFS="\t" file | sort
chr10:112174128 rs2255141/rs2255141 2E-10/7E-16 Cholesterol, total;LDL
chr10:17218291 rs10904908/rs970548 3E-11/8E-9 HDL Cholesterol;TG
chr1:109275684 rs629301 2E-170 Cholesterol, totat
1 Like
If we sort first can save using memory like this:
sort infile | awk '
F1 && F1!=$1 {print F1,F2,F3,F4; F2=F3=F4=x}
{ F1=$1
F2=(F2?F2"/":x)$2
F3=(F3?F3"/":x)$3
F4=(F4?F4";":x)$4 }
END {print F1,F2,F3,F4}' FS='\t' OFS='\t'
chr1:109275684 rs629301 2E-170 Cholesterol, totat
chr10:112174128 rs2255141/rs2255141 2E-10/7E-16 Cholesterol, total;LDL
chr10:17218291 rs10904908/rs970548 3E-11/8E-9 HDL Cholesterol;TG