Compare files using awk

Mary_James · June 4, 2012, 7:18am

Please help me to compare two files and remove the items in file2 from file1

file 1:delimited using pipe(|)

file1

 
00012|Description - 1|||||AA12345|1|AB12345|2|2012/06/03
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03
ABCDE|Description,description - 4|||||PA12345|10|AB12345|20|2012/06/03

file2

Expected output

output

 
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03
ABCDE|Description,description - 4|||||PA12345|10|AB12345|20|2012/06/03

Here output doesnot contain first row in file1(row containing 00012) since file2 contains 00012. Comparioson should happen between first item in file1 and item in file2.So even if 00012 is present in the 3rd row ,it is not removed

Also, is there any way to direct the removed rows/items to another output file

vivek_d_r · June 4, 2012, 8:24am

but how come this line in file 1 is not removed ??
"11111" since this number is present in both the files... it should be removed right?

sdf · June 4, 2012, 8:47am

It depends on what you want to select. If you want only those rows of file1 where field 1 is in file2 than use:

awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}a[$1]' file2 file1 >selected_lines

If you want to deselect rows of file1 where field 1 is not in file2 than use:

awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}!a[$1]' file2 file1 >selected_lines

vivek_d_r · June 4, 2012, 8:50am

[root@ jun4]# cat file1
00012|Description - 1|||||AA12345|1|AB12345|2|2012/06/03
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03
ABCDE|Description,description - 4|||||PA12345|10|AB12345|20|2012/06/03
[root@ jun4]# cat file2
11111
o1234
00012
[root@jun4]# ./com.sh
[root@jun4]# cat file1
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
ABCDE|Description,description - 4|||||PA12345|10|AB12345|20|2012/06/03

my primitive code :-p

while read line1
do
        while read line2
        do
                first=$( echo $line1 | cut -d'|' -f1 )
#               echo "first : $first"
#               echo "line2: $line2"
                if [[ "$first" == "$line2" ]];then
#                       echo "matched"
                        echo $line1>>file3 #redirecting deleted line to new file: file3"
                        lineNo=$( grep -n "^$first" file1 | cut -d':' -f1 )
#                       echo "lineNO: $lineNo"
                        echo "`sed -e ''$lineNo'd' file1`" >file1
                fi
        done<file2
done<file1

Mary_James · June 5, 2012, 1:13am

Thanks Vivek,its my mistake.11111 should be removed from the output as it is present in file2

file1

 
00012|Description - 1|||||AA12345|1|AB12345|2|2012/06/03
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
11111|Description - 3|||||AP00012|1|AB12345|2|2012/06/03
ABCDE|Description,description - 4|||||PA12345|10|AB00012|20|2012/06/03

file2

Expected output

output

 
AB123|Description - 2|||||AA12345|3|ZA11111|4|2012/06/04
ABCDE|Description,description - 4|||||PA12345|10|AB00012|20|2012/06/03

vivek_d_r · June 5, 2012, 1:35am

oh okay..

Mary_James · June 9, 2012, 8:25am

sdf:

It depends on what you want to select. If you want only those rows of file1 where field 1 is in file2 than use:
awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}a[$1]' file2 file1 >selected_lines
If you want to deselect rows of file1 where field 1 is not in file2 than use:
awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}!a[$1]' file2 file1 >selected_lines

i used the below code

 
awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}a[$1]'

This gives me the error

 
a[$1]': Event not found

Changed the code by adding '\' as

 
nawk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}\!a[$1]'

It worked.Hope this is correct.Thanks to sdf and Vivek

Mary_James · June 15, 2012, 12:14am

How can I modify the awk program if I need to compare file2 and the second item in file1(Description here)

Please suggest

---------- Post updated 06-15-12 at 09:44 AM ---------- Previous update was 06-14-12 at 09:50 PM ----------

Hi
Please suggest how can I modify the awk script,if I want to compare file f2 and the second item in file1.That is items in f2 and description in f1

vivek_d_r · June 15, 2012, 12:37am

use this

awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}!a[$2]' file2 file1

original code was

awk 'BEGIN{FS="|"}NR==FNR{a[$1]=$1;next}!a[$1]' file2 file1

here you will be looking for first column marked red... so make it as second column by making it a[$2]... thats your fix