comm command help with unicode chars in file

ashwin3086 · July 6, 2010, 6:15am

Hi,

I have a Master file (file.txt) with good and bad records( records with unicode characters). I ahve a file with only bad records (bad.txt)

I want the records in file.txt which are not present in bad.txt ie only the good records.

I tried

 comm -23 file.txt bad.txt

It is giving all the records in file.txt.
Any help??

rdcwayx · July 6, 2010, 7:42am

you need paste the sample of master file and bad records file

ashwin3086 · July 6, 2010, 7:56am

Take this as example.. a1 is master file and a2 is bad record file.My output should be
9
8
6
4
2

$ cat a1
9
8
7
6
5
4
2
3
$ cat a2
1
3
5
7
$ comm -23 a1 a2
9
8
7
6
5
4
2
3

drl · July 6, 2010, 8:01am

Hi.

Best wishes ... cheers, drl

ashwin3086 · July 6, 2010, 8:04am

Is there any way i can do it without sorting the file.The file is big.

rdcwayx · July 6, 2010, 8:22am

find the different.

$ grep -v -f a2 a1
9
8
6
4
2

find the same records.

$ grep -f a2 a1
7
5
3

ashwin3086 · July 6, 2010, 8:44am

Getting the following error with grep -v -f

$ grep -v -f a2 a1
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .

dazdseg · July 6, 2010, 8:55am

grep -v a2 a1

this shud get u going..

u can use ur master and bad sector file and get the good records

lets say
master file = m, bad record=b
cat m > result
cat b >> result
uniq -u result

this will only give u good results.

-u Displays only lines not duplicated (uniq lines).

methyl · July 6, 2010, 9:56am

Assuming each record is unique and that the output order does not matter:

cat file.txt bad.txt | sort |uniq -u

Or sort each file before the "comm".

cat file.txt | sort >file.sor
cat bad.txt  | sort >bad.sor
comm -23 file.sor bad.sor

Corona688 · July 6, 2010, 10:15am

There, now you know the real words.

ashwin3086 · July 6, 2010, 10:21am

@corona :Thats what dazdseg told rite.. Did you add anything to it...

dazdseg · July 6, 2010, 10:22am

thanks for letting me know.

jim_mcnamara · July 6, 2010, 10:25am

Please don't use 'l33t sp33k' - this isn't twitter. Plain English. If English is a problem for you we have translation services. That is what corona is telling you.

drl · July 6, 2010, 10:26am

Hi, dazdseg.

dazdseg:

grep -v a2 a1 
this shud get u going..

u can use ur master and bad sector file and get the good records
lets say
master file = m, bad record=b
cat m > result
cat b >> result
uniq -u result
this will only give u good results.
-u Displays only lines not duplicated (uniq lines). 

However:

cheers, drl

ashwin3086 · July 6, 2010, 10:27am

I will remember that as well since i am new to the forum.