Input file needs to match a column and print the entire line

lewk · June 15, 2016, 3:00pm

I have a file with class c IP addresses that I need to match to a column and print the matching lines of another file.

I started playing with grep -if file01.out file02.out but I am stuck as to how to match it to a column and print the matching lines;

cat file01.out

10.150.140
192.168.30
192.168.40
192.168.50
192.168.60

cat file02.out

192.168.200.10,192.168.40.44,22
192.168.100.11,192.168.249.255,23
192.168.118.12,192.168.30.200,22
10.67.160.295,192.168.248.31,53
10.68.132.20,192.168.60.33,443

The result needs to look like this;

192.168.200.10,192.168.40.44,22
192.168.118.12,192.168.30.200,22
10.68.132.20,192.168.60.33,443

Please explain what your command, code does. I am learning

Don_Cragun · June 15, 2016, 3:58pm

In you sample file02.out sample, the 3rd line is:

192.168.118.12,192.192.168.30.200,22

Is that correct, or did you intend to have:

192.168.118.12,192.168.30.200,22

Are you looking for a match on any substring of the 2nd field in file02.out , or are you looking for an exact match on the first three components of the IP address in the 2nd field in file02.out ?

What operating system and shell are you using?

lewk · June 15, 2016, 4:06pm

don cragun:

In you sample file02.out sample, the 3rd line is:
192.168.118.12,192.192.168.30.200,22
Is that correct, or did you intend to have:
192.168.118.12,192.168.30.200,22
Are you looking for a match on any substring of the 2nd field in file02.out , or are you looking for an exact match on the first three components of the IP address in the 2nd field in file02.out ?

What operating system and shell are you using?

Thanks for spotting that error, I corrected it. I intended to have as per your example. I am looking for an exact match on the first three components on the second field.

I am using Ubuntu but right now I am on Cygwin :o

Don_Cragun · June 15, 2016, 5:05pm

A fairly simple way to do this with awk is:

awk -F, ' 		# Invoke awk to process your files with the field
			# separator set to ",".
FNR == NR {		# For lines in the 1st input file (the # of records in
			# the current file is the same as the # of records in
			# all files read so far)...
	ip[$1 "."]	# Add an element to the array ip[] with the index being
			# the text found in the 1st field from the first file
			# with a "." added to the end of that string.
	next		# Stop processing this input record and read the next
			# input line.
}
{	for(i in ip)	# For lines read from any following input files, loop
			# through all of the index values that have been used in
			# the array ip[], with "i" set to a different index each
			# time through the loop...
		if(index($2, i) == 1) {	# if the index for this time through the
					# loop identically matches a substring
					# starting with the 1st character of the
					# 2nd field in this file...
			print	# print the current input line...
			next	# and stop processing this line and read the
				# next input line
		}
}' file0[12].out	# End the script and name the two input files.

If someone else wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Note that I used index() rather than using an ERE match because the period is a special character in an ERE that matches any character (not just a period), so setting up the index when adding elements to the ip[] array would have been more difficult:

	ip[$1 "."]

would need to be something more like:

	gsub(/[.]|$/, "[.]", $1)
	ip["^" $1]

but the match in the loop:

		if(index($2, i) == 1) {

would have been simpler (but would probably also run slower):

		if($2 ~ i) {

lewk · June 16, 2016, 4:46pm

brilliant, thank you for the detailed explanation too!

---------- Post updated at 10:46 PM ---------- Previous update was at 05:25 PM ----------

How would I modify it if for example I do not want to show lines where a match was found in column 2?

cat file01.out

10.150.140
192.168.30
192.168.40
192.168.50
192.168.60

cat file02.out

192.168.200.10,192.168.40.44,22
192.168.100.11,192.168.249.255,23
192.168.118.12,192.168.30.200,22
10.67.160.295,192.168.248.31,53
10.68.132.20,192.168.60.33,443

The result needs to look like this;



192.168.100.11,192.168.249.255,23
10.67.160.295,192.168.248.31,53

Don_Cragun · June 16, 2016, 5:11pm

Move the print statement as shown below. Current script:

awk -F, '
FNR == NR {
	ip[$1 "."]
	next
}
{	for(i in ip)
		if(index($2, i) == 1) {
			print
			next
		}
}' file0[12].out

Script to print lines with no IP matches instead of lines with a matching IP:

awk -F, '
FNR == NR {
	ip[$1 "."]
	next
}
{	for(i in ip)
		if(index($2, i) == 1) {
			next
		}
	print
}' file0[12].out