Hello
I just want to ask how to get the match of information column 2 file 1 and retrieve information from column 2 file 2. The column exon in file 1 and column color code in file 2.
File 1
//NODECOLORCODE "Exon 1" "ENST00000595813"
//NODECOLORCODE "Exon 1" "ENST00000596051"
//NODECOLORCODE "Exon 2" "ENST00000243643"
//NODECOLORCODE "Exon 2" "ENST00000421033"
//NODECOLORCODE "Exon 3" "ENST00000243643"
//NODECOLORCODE "Exon 3" "ENST00000421033"
//NODECOLORCODE "Exon 4" "ENST00000243643"
//NODECOLORCODE "Exon 4" "ENST00000421033"
//NODECOLORCODE "Exon 5" "ENST00000421033"
//NODECOLORCODE "Exon 5" "ENST00000440291"
File 2
"Exon 1" "#FF0000"
"Exon 2" "#FFD400"
"Exon 3" "#FFff00"
"Exon 4" "#2Bff00"
"Exon 5" "#00ff00"
Desired output
//NODECOLORCODE "Exon 1" "ENST00000595813" "#FF0000"
//NODECOLORCODE "Exon 1" "ENST00000596051" "#FF0000"
//NODECOLORCODE "Exon 2" "ENST00000243643" "#FFD400"
//NODECOLORCODE "Exon 2" "ENST00000421033" "#FFD400"
//NODECOLORCODE "Exon 3" "ENST00000243643" "#FFff00"
//NODECOLORCODE "Exon 3" "ENST00000421033" "#FFff00"
//NODECOLORCODE "Exon 4" "ENST00000243643" "#2Bff00"
//NODECOLORCODE "Exon 4" "ENST00000421033" "#2Bff00"
//NODECOLORCODE "Exon 5" "ENST00000421033" "#00ff00"
//NODECOLORCODE "Exon 5" "ENST00000440291" "#00ff00"
I try using awk and it doesn't come out with the desired output I want. Thanks
awk 'FNR==NR {C[$2]=$1;next} FNR==1 {print "NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE"; next} $1 in C {print C[$1], $1, $4, $2, $3}'
Are those tab separated? If not, you'll have to pretend they're quote separated and use some tricks to properly grab these fields.
Assuming not tab separated:
$ awk -F\" -v OFS='\t' 'FNR==NR{C[$2]=$4;next}FNR==1{print "NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE"; next} 1 {print $1,$2,$4,C[$2]}' file2 file1
NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE
//NODECOLORCODE Exon 1 ENST00000596051 #FF0000
//NODECOLORCODE Exon 2 ENST00000243643 #FFD400
//NODECOLORCODE Exon 2 ENST00000421033 #FFD400
//NODECOLORCODE Exon 3 ENST00000243643 #FFff00
//NODECOLORCODE Exon 3 ENST00000421033 #FFff00
//NODECOLORCODE Exon 4 ENST00000243643 #2Bff00
//NODECOLORCODE Exon 4 ENST00000421033 #2Bff00
//NODECOLORCODE Exon 5 ENST00000421033 #00ff00
//NODECOLORCODE Exon 5 ENST00000440291 #00ff00
Hello
I just want to ask how to get the match of information column 2 file 1 and retrieve information from column 2 file 2. The column exon in file 1 and column color code in file 2.
File 1
//NODECOLORCODE "Exon 1" "ENST00000595813"
//NODECOLORCODE "Exon 1" "ENST00000596051"
//NODECOLORCODE "Exon 2" "ENST00000243643"
//NODECOLORCODE "Exon 2" "ENST00000421033"
//NODECOLORCODE "Exon 3" "ENST00000243643"
//NODECOLORCODE "Exon 3" "ENST00000421033"
//NODECOLORCODE "Exon 4" "ENST00000243643"
//NODECOLORCODE "Exon 4" "ENST00000421033"
//NODECOLORCODE "Exon 5" "ENST00000421033"
//NODECOLORCODE "Exon 5" "ENST00000440291"
File 2
"Exon 1" "#FF0000"
"Exon 2" "#FFD400"
"Exon 3" "#FFff00"
"Exon 4" "#2Bff00"
"Exon 5" "#00ff00"
Desired output
//NODECOLORCODE "Exon 1" "ENST00000595813" "#FF0000"
//NODECOLORCODE "Exon 1" "ENST00000596051" "#FF0000"
//NODECOLORCODE "Exon 2" "ENST00000243643" "#FFD400"
//NODECOLORCODE "Exon 2" "ENST00000421033" "#FFD400"
//NODECOLORCODE "Exon 3" "ENST00000243643" "#FFff00"
//NODECOLORCODE "Exon 3" "ENST00000421033" "#FFff00"
//NODECOLORCODE "Exon 4" "ENST00000243643" "#2Bff00"
//NODECOLORCODE "Exon 4" "ENST00000421033" "#2Bff00"
//NODECOLORCODE "Exon 5" "ENST00000421033" "#00ff00"
//NODECOLORCODE "Exon 5" "ENST00000440291" "#00ff00"
I try using awk and it doesn't come out with the desired output I want. Thanks
awk 'FNR==NR {C[$2]=$1;next} FNR==1 {print "NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE"; next} $1 in C {print C[$1], $1, $4, $2, $3}'
Hello Wan Fahmi,
Following may also help you in same.
awk 'FNR==NR{B=$NF;$NF="";gsub(/[[:space:]]+$/,X,$0);A[$0]=B;next} ($2" "$3 in A){print $0 OFS A[$2" "$3]}' File2 OFS="\t" File1
Thanks,
R. Singh
---------- Post updated at 11:59 AM ---------- Previous update was at 11:56 AM ----------
neutronscott:
Are those tab separated? If not, you'll have to pretend they're quote separated and use some tricks to properly grab these fields.
Assuming not tab separated:
$ awk -F\" -v OFS='\t' 'FNR==NR{C[$2]=$4;next}FNR==1{print "NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE"; next} 1 {print $1,$2,$4,C[$2]}' file2 file1
NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE
//NODECOLORCODE Exon 1 ENST00000596051 #FF0000
//NODECOLORCODE Exon 2 ENST00000243643 #FFD400
//NODECOLORCODE Exon 2 ENST00000421033 #FFD400
//NODECOLORCODE Exon 3 ENST00000243643 #FFff00
//NODECOLORCODE Exon 3 ENST00000421033 #FFff00
//NODECOLORCODE Exon 4 ENST00000243643 #2Bff00
//NODECOLORCODE Exon 4 ENST00000421033 #2Bff00
//NODECOLORCODE Exon 5 ENST00000421033 #00ff00
//NODECOLORCODE Exon 5 ENST00000440291 #00ff00
Hello neutronscott,
OP needs to get quotes too in output so I am just adding a few addition to it, hope it will help us all.
awk -F\" -v OFS='\t' 'FNR==NR{C[$2]=$4;next}FNR==1{print "NODECLASSCOLOR EXON TRANSCRIPT_ID COLORCODE"; next} 1 {print $1,FS $2 FS,FS $4 FS ,FS C[$2] FS}' File2 File1
Thanks,
R. Singh
Check this following code:
# cat MergeInfo.sh
#!/bin/ksh
[[ $# -ne 2 ]] && {
echo "Usage: $0 <File1> <File2>" 1>&2
exit 1;
}
[ -e $1 ] || {
echo "$1 does not exists!!!!!." 1>&2
exit 1;
}
[ -e $2 ] || {
echo "$2 does not exists!!!!!." 1>&2
exit 1;
}
awk '{ if (( NF != "4" )) { print FILENAME " - Incorrect file format!!!!!."; exit 1; } }' $1
[[ `echo $?` -ne "0" ]] && {
exit 1;
}
awk '{ if (( NF != "3" )) { print FILENAME " - Incorrect file format!!!!!."; exit 1; } }' $2
[[ `echo $?` -ne "0" ]] && {
exit 1;
}
cat $1 | while read COLUMN1 COLUMN2 COLUMN3 COLUMN4;
do
CODE=`grep -w ${COLUMN2} File2 | grep -w ${COLUMN3} | awk '{print $NF}'`;
echo "${COLUMN1} ${COLUMN2} ${COLUMN3} ${COLUMN4} ${CODE}";
done
RudiC
May 18, 2015, 12:51pm
5
If your awk
allows for multichar field separators, try
awk 'FNR==NR {T[$1]=$2; next} {print $0, T[$2]}' 'FS= +' file2 file1
//NODECOLORCODE "Exon 1" "ENST00000595813" "#FF0000"
//NODECOLORCODE "Exon 1" "ENST00000596051" "#FF0000"
//NODECOLORCODE "Exon 2" "ENST00000243643" "#FFD400"
//NODECOLORCODE "Exon 2" "ENST00000421033" "#FFD400"
//NODECOLORCODE "Exon 3" "ENST00000243643" "#FFff00"
//NODECOLORCODE "Exon 3" "ENST00000421033" "#FFff00"
//NODECOLORCODE "Exon 4" "ENST00000243643" "#2Bff00"
//NODECOLORCODE "Exon 4" "ENST00000421033" "#2Bff00"
//NODECOLORCODE "Exon 5" "ENST00000421033" "#00ff00"
//NODECOLORCODE "Exon 5" "ENST00000440291" "#00ff00"