I think this is the toughest prob :wall: I have ever come across and I thankfully owe all of u for helping me cross this.
cat 1.txt
cat 2.txt
K now. This is what I am looking for.
Output.txt
Here is how my output has been generated.
First, the column one of each file has to be matched to column one of other files, like chr1 to chr1, chr2 to chr2 and chr3 to chr3 only. No different column values has to be matched.
Second, if a particular range of column 2 and 3 intersects/comes in between the range of column 2 and 3 of the other file, they have to be eliminated.
Examples from the given input:
chr1 100 200(1.txt) intersects with chr1 156 199(2.txt), chr1 165 230(2.txt). So, they are eliminated.
chr1 450 700(1.txt) intersects with chr1 525 600(2.txt). So, these two are eliminated from the output.
Similarly,
chr2 500 600(1.txt) intersects with chr2 534 676(2.txt). So, it is eliminated.
chr2 345 765(1.txt) intersects with chr2 200 400(2.txt). So, it is eliminated from the output file.
Same is the case for chr3 too. My files have different number of records in each of them which are not sorted. The last column in the output file indicates the file from which the record originates. If you have any questions or suggestions, please write in the reply and I shall reply ASAP to clarify your doubt that might give me a chance to kick this problem out. All your time, patience and attention are highly appreciated.
Other solution using perl. I think it should work with any number of input files. Give it a try. It could be more efficient, but I struggled a little to get it, so if it works, I will be happy for that:
Thanks a lot Birei. The script works for the two files I have mentioned earlier before. And I even tried using it with 3 files. The 3 files and their output has been given below just for your confirmation and my satisfaction
Thanks once again
cat 1.txt
cat 2.txt
cat3.txt
perl newscript.pl 1.txt 2.txt 3.txt
I find everything to be smooth. Let me know if you see anything.
#!/bin/ksh
typeset -i mFromA mToA mFromB mToB
mF1='1.txt'
mF2='2.txt'
mPrevTag=''
#### sort is used to reduce the number of "grep"
sort ${mF1} | while read mTagA mFromA mToA; do
if [[ "${mTagA}" != "${mPrevTag}" ]]; then
grep "${mTagA}" ${mF2} > ${mF2}.tmp
fi
mFound="N"
while read mTagB mFromB mToB; do
if [[ ${mToA} -ge ${mFromB} && ${mFromA} -le ${mToB} ]]; then
mFound="Y"
break
fi
done < ${mF2}.tmp
if [[ "${mFound}" = "N" ]]; then
echo ${mTagA} ${mFromA} ${mToA} ${mF1}
fi
mPrevTag=${mTagA}
done