I have two files with 2 columns each , together these two column make up a pair that I need. There are certain pairs that are common in both files. I need to create 3 files, one with the common pairs, and two files for exclusive pairs of each mother file.
It should be noted that common pairs may appear in any row of the other file.
I am using bash on ubuntu 18.04. I have tried with fortran, but for large files, it is crashing.
Please forgive me if I am unable to express my problem properly. Thank you in advance
using Fortran nowadays is quite unusual Here is an approach with python3:
#!/usr/bin/python3
# convert to int in order to make tuples sortable by number, not by string
f1 = set(tuple(map(int, ln.split())) for ln in open('f1'))
f2 = set(tuple(map(int, ln.split())) for ln in open('f2'))
# sets by definition haven't any order, so sort them
print('\n'.join('{} {}'.format(*t) for t in sorted(f1 & f2)), file=open('f.both', 'w'))
print('\n'.join('{} {}'.format(*t) for t in sorted(f1 - f2)), file=open('f.only-1', 'w'))
print('\n'.join('{} {}'.format(*t) for t in sorted(f2 - f1)), file=open('f.only-2', 'w'))
And for verification in bash:
#!/bin/bash
for f in f.both f.only*; do
echo "${f#*.}:"
# read pairs from outfiles and search for them in both infiles
while read x y; do
# \b matches at the edge of a word/number
# that's only because of the different spaces in the infiles,
# otherwise "^$x $y$" would be enough, and the sed wouln't be needed
# then sort by 1st number & strip multiple spaces
grep -E "\b$x\b.*\b$y\b" f1 f2 | sort -g -k2 | sed -r 's,[[:space:]]+, ,g'
done < $f
done
There are of course other and faster methods, especially with awk, or (maybe) with tools like cmp, join, paste etc. I only suggested this because it's a typical example for sets in python.