Dear All,
I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5).
Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are concatenated and redirected as output.
File1:
s2/80 20 . A T 86 N=2 F=5;U=4
s2/20 10 . G T 90 N=2 F=5;U=4
s2/90 60 . C G 30 N=2 F=5;U=4
s2/40 70 . A G 80 N=2 F=5;U=4
File2:
s2/90 60 . G G 97 N=2 F=5;U=4
s2/80 20 . A A 20 N=2 F=5;U=4
s2/15 11 . A A 22 N=2 F=5;U=4
s2/90 21 . C C 82 N=2 F=5;U=4
s2/20 10 . G G 99 N=2 F=5;U=4
s2/40 70 . A G 70 N=2 F=5;U=4
s2/80 10 . T G 11 N=2 F=5;U=4
s2/90 60 . G T 55 N=2 F=5;U=4
Expected Output:
s2/80 20 . A T 86 N=2 F=5;U=4 s2/80 20 . A A 20 N=2 F=5;U=4
s2/20 10 . G T 90 N=2 F=5;U=4 s2/20 10 . G G 99 N=2 F=5;U=4
s2/90 60 . C G 30 N=2 F=5;U=4 s2/90 60 . G G 97 N=2 F=5;U=4
I am new in the field and I would appreciate your help.
Because the 4th and the 5th column has A G for both File1 and 2. If File1 has A G at 4th and at 5th column, then I want to select only those which as A A or G G in File2. The logic is if in File 1, there is "X" in column 4 and "Y" in column 5, I want to select only those which has "X" "X" or "Y" "Y" in File 2 at 4th and 5th column.
The output gives a semicolon where it concatenates. How do i avoid this ";"
s2/80 20 . A T 86 N=2 F=5;U=4;s2/80 20 . A A 20 N=2 F=5;U=4
s2/20 10 . G T 90 N=2 F=5;U=4;s2/20 10 . G G 99 N=2 F=5;U=4
s2/90 60 . C G 30 N=2 F=5;U=4;s2/90 60 . G G 97 N=2 F=5;U=4
As i want the output file to look like this:
s2/80 20 . A T 86 N=2 F=5;U=4 s2/80 20 . A A 20 N=2 F=5;U=4
s2/20 10 . G T 90 N=2 F=5;U=4 s2/20 10 . G G 99 N=2 F=5;U=4
s2/90 60 . C G 30 N=2 F=5;U=4 s2/90 60 . G G 97 N=2 F=5;U=4