Hi,
How can i ignore case between 2 files in unix using COMM
command.
2 input files are:
-bash-4.1$ more x2.txt
HELLO
hi
HI
raj
-bash-4.1$ more x3.txt
hello
hi
raj
COMM
command:
-bash-4.1$ comm x2.txt x3.txt
hello
HELLO
hi
HI
raj
Here hello/HELLO should come in 3rd column as both words are same.How to achieve this as in help there are no ignoring case option
Have you tried using tr
to create copies of both of your input files with all uppercase characters converted to lowercase and the using comm
on the converted files?
For comm
to work correctly the files have to be sorted. Capital letters and lowercase letters are different when sorted - just so you know. UNIX does not ignore case differences by default, except for some commands that allow an option to ignore case.
It occurred to me that Don's good suggestion might be beyond what you know how to do. So here is one way to to use the tr
command to feed the comm command in bash -
comm <( tr -s '[:upper:]' '[:lower:]' < x2.txt) <(tr -s '[:upper:]' '[:lower:]' < x3.txt)
The <( )
thing is called process substitution
in bash.
Syntax is
<( command string goes here, it must produce output )
You can have a >( )
or a <( )
combination of angle brackets and parentheses.
As Jim said, comm
needs both input files to be sorted. After case-shifting both of your sample input files to lowercase, they happen to be in sorted order. If that is not the case with your real data files, you will also need to sort them after shifting to lowercase.
I don't think you want the tr -s
option. That suppresses repeated adjacent occurrences of the same character in the output. (For example:
echo HELLO | tr -s '[:upper:]' '[:lower:]'
would produce the output:
helo
note the LL
in the input and the single l
in the output.)
The process substitution feature Jim suggested is available in bash
, some recent versions of ksh
, and a few other shells; but it is not in the standards and is not available in many other shells. If you're using a shell that just supports POSIX standard features (and your input might need to be sorted), you could try something more like:
tr '[:upper:]' '[:lower:]' < x2.txt | sort > $$.2.txt
tr '[:upper:]' '[:lower:]' < x3.txt | sort > $$.3.txt
comm $$.[23].txt
rm -f $$.[23].txt
which creates copies of your input files that have been converted to lowercase and sorted, runs comm
on the copies, and then removes the copies.