How to ignore Case with in COMM command?

raju2016 · January 23, 2017, 2:10am

Hi,

How can i ignore case between 2 files in unix using COMM command.
2 input files are:

-bash-4.1$ more x2.txt
HELLO
hi
HI
raj

-bash-4.1$ more x3.txt
hello
hi
raj

COMM command:

-bash-4.1$ comm x2.txt x3.txt
        hello
HELLO
                hi
HI
                raj

Here hello/HELLO should come in 3rd column as both words are same.How to achieve this as in help there are no ignoring case option

Don_Cragun · January 23, 2017, 3:07am

Have you tried using tr to create copies of both of your input files with all uppercase characters converted to lowercase and the using comm on the converted files?

jim_mcnamara · January 23, 2017, 11:06am

For comm to work correctly the files have to be sorted. Capital letters and lowercase letters are different when sorted - just so you know. UNIX does not ignore case differences by default, except for some commands that allow an option to ignore case.

It occurred to me that Don's good suggestion might be beyond what you know how to do. So here is one way to to use the tr command to feed the comm command in bash -

comm <( tr -s '[:upper:]' '[:lower:]' < x2.txt)  <(tr -s '[:upper:]' '[:lower:]' < x3.txt)

The <( ) thing is called process substitution in bash.

Syntax is

<( command string goes here, it must produce output )

You can have a >( ) or a <( ) combination of angle brackets and parentheses.

Don_Cragun · January 23, 2017, 4:43pm

As Jim said, comm needs both input files to be sorted. After case-shifting both of your sample input files to lowercase, they happen to be in sorted order. If that is not the case with your real data files, you will also need to sort them after shifting to lowercase.

I don't think you want the tr -s option. That suppresses repeated adjacent occurrences of the same character in the output. (For example:

echo HELLO | tr -s '[:upper:]' '[:lower:]'

would produce the output:

helo

note the LL in the input and the single l in the output.)

The process substitution feature Jim suggested is available in bash , some recent versions of ksh , and a few other shells; but it is not in the standards and is not available in many other shells. If you're using a shell that just supports POSIX standard features (and your input might need to be sorted), you could try something more like:

tr '[:upper:]' '[:lower:]' < x2.txt | sort > $$.2.txt
tr '[:upper:]' '[:lower:]' < x3.txt | sort > $$.3.txt
comm $$.[23].txt
rm -f $$.[23].txt

which creates copies of your input files that have been converted to lowercase and sorted, runs comm on the copies, and then removes the copies.