Compare values in two different files

Hello,

I have output in one file that looks like:

AA 3
BB 1
CC 3
DD 6
EE 2
FF 6 

And output in another file that looks like:

1 EE
3 CC
2 AA

I basically want to be able to match the counts in each file against the correct corresponding initials (and then obviosuly base a command on whether true or false). However, not all initials will always appear and will not be in the same order in both files if they do appear.

Would it be some sort of awk command to complete this?

Hello Nik44,

As you haven't mentioned expected output, so based on my assumptions could you please try following.
Let's say following are the Input_files.

cat Input_file1
AA 3
BB 1
CC 3
DD 6
EE 2
FF 6

cat Input_file2
1 EE
3 CC
2 AA
12 AF
awk 'BEGIN{print "ids which are present in file1 and file2 both:"} FNR==NR{A[$1]=$0;next} ($2 in A){print;delete A[$2];next} !($2 in A){Q=Q?Q ORS $0:$0}  END{print "ids which are present in file2 and not in file1" ORS Q;print "ids which  are present in file1 and not in file2.";for(i in A){print A}}' Input_file1  Input_file2

Output will be as follows.

ids which are present in file1 and file2 both:
1 EE
3 CC
2 AA
ids which are present in file2 and not in file1
12 AF
ids which  are present in file1 and not in file2.
BB 1
DD 6
FF 6

Thanks,
R. Singh

Apologies, probably didn't explain as well as I could have.

There is effectively no output. What I want to happen is where there is a count match for a pair of letters in each file, I want this to then trigger a command.

So basically with the example provided in this instance only CC would match as both have a count of 3 (and therefore trigger command). The remaining letter pairs wouldn't trigger my command as the counts in each files do not match.

Does this clarify?

That is a better start, but it still leaves holes...

What is the name of this command?

Are any arguments supposed to be passed to this command (and, if so, what arguments)?

What shell are you using?

What operating system are you using?

This for a unix bash script.

When there is a match between a pair of letters AND corresponding counts between the two files (if "CC" is 3 in file1 and "CC" is also 3 in file2) then I would like this to then move particular files from one directory into another directory (it's not so much this part of the syntax I'm concerned with though, more how to do the actual comparison).

Hello Nik44,

Still we are not sure about your requirement as you didn't tell us which commands you want to execute. As an example following you could try and let us know if this helps or please mention your complete requirements.

awk 'FNR==NR{A[$1 OFS $2]=$0;next} (($2 OFS $1) in A){print;system("date");delete A[$2 OFS $1];next}' Input_file1  Input_file2

Output will be as follows.

3 CC
Wed Aug 10 12:36:29 GMT 2016
 

Thanks,
R. Singh

Hi.

Using commands to follow the decomposition of the problem into steps:

#!/usr/bin/env bash

# @(#) s1       Demonstrate filter one file with contents from another, grep

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk tee grep

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Input file data2:"
cat data2

pl " Amended input data file data3:"
awk '{ print $2,$1}' data2 |
tee data3

rm -f data4
pl " Results, filter data1, selecting data3 contents:"
grep -f data3 data1 |
tee data4
# rm -f data4   # test when grep does not create a file
pe
if [ -s data4 ] # true if file exists and size > 0
then
  while read line       # do some command for each filtered line
  do
    pe " Do your command here with line \"$line\""
  done < data4
else
  pe " No lines selected."
fi

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.4 (jessie) 
bash GNU bash 4.3.30
awk GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)
tee (GNU coreutils) 8.23
grep (GNU grep) 2.20

-----
 Input data file data1:
AA 3
BB 1
CC 3
DD 6
EE 2
FF 6

-----
 Input file data2:
1 EE
3 CC
2 AA
12 AF

-----
 Amended input data file data3:
EE 1
CC 3
AA 2
AF 12

-----
 Results, filter data1, selecting data3 contents:
CC 3

 Do your command here with line "CC 3"

Best wishes ... cheers, drl

PS I thought I had posted this earlier, apologies if it becomes a duplicate post

1 Like

Being cryptic about what you're really trying to do makes it harder to give you any real help. What Ravinder or drl suggested might do exactly what you want.

You could also try this alternative:

#!/bin/bash
awk '
FNR == NR {
	A[$1, $2]
	next
}
($2, $1) in A {
	printf("echo \"string %s found with numbers %s\"\n", $2, $1)
	printf("echo mv \"somefile1\" \"somefile2\" \"somefile3\" \"/some/other/directory\"\n")
	delete A[$2, $1]
}' file1  file2 | bash

which produces the output:

string CC found with numbers 3
mv somefile1 somefile2 somefile3 /some/other/directory

where the echo shown in red prints the rm command instead of executing it. (But, of course, you have to actually have awk create the list of files to be moved and create an mv command for bash to execute.

If the names of the files to be moved depend on the values from the matched lines, something more like:

#!/bin/bash
awk '
FNR == NR {
	A[$1, $2]
	next
}
($2, $1) in A {
	print $2, $1
	delete A[$2, $1]
}' file1  file2 | while IFS= read -r string number
do	printf 'string %s found with numbers %d\n' "$string" "$number"
	echo mv "FileContaining${string}and$number".* "/some/other/directory"
done

might be more appropriate. In a directory that contains the files:

-rw-r--r--  1 dwc  staff    0 Aug 17 09:28 FileContainingCCand3.1.txt
-rw-r--r--  1 dwc  staff    0 Aug 17 09:28 FileContainingCCand3.4.txt
-rw-r--r--  1 dwc  staff    0 Aug 17 09:28 FileContainingCCand3.7.txt
-rw-r--r--  1 dwc  staff   30 Aug 17 09:04 file1
-rw-r--r--  1 dwc  staff   15 Aug 17 09:04 file2
-rw-r--r--  1 dwc  staff  682 Aug 17 09:05 problem
-rwxr-xr-x  1 doc  staff  532 Aug 17 09:28 tester

it produces the output:

string CC found with numbers 3
mv FileContainingCCand3.1.txt FileContainingCCand3.4.txt FileContainingCCand3.7.txt /some/other/directory

as long as you keep the echo shown in red. If you remove the echo , it would actually attempt to move those files.