Comparing two columns in two files and printing a third based on a match

Hello all, First post here. I did not notice a previous post to help me down the right path. I am looking to compare a column in a CSV file against another file (which is not a column match one for one) but more or less when a match is made, I would like to append a third column that contains a name to the end of the matching row. For instance:

File 1:

Hostname,Port,Name,Status,VLAN,Duplex,Speed,Type
switch1,gi2/1,trunk,active,1,a-100,full,10g,10GBase-CU 3M
switch1,gi3/1,link to data closet,disabled,2,half,a-1000,No XCVR
etc..

File 2:

VLAN,Name,Status,Ports,Device
1,default,active,gi1/1,switch5
2,voice,active,gi2/1,switch5
etc...

Desired Output File:

Hostname,Port,Name,Status,VLAN,Duplex,Speed,Type,VLAN Name
switch1,gi2/1,trunk,active,1,a-100,full,10g,10GBase-CU 3M,default
switch1,gi3/1,link to data closet,disabled,2,half,a-1000,No XCVR,voice

To condense what Im trying to accomplish, I want to create a third file that will have an additional column that appends the vlan name when a match is made. When a match is not made, NA is fine.

Thanks for any help, I've been struggling with SED and AWK

Welcome.

Please post what you have tried so far.

Most everything that I started with is garbage and I hit a wall when I tried to compare to the second file. My approach was to combine the two files and compare which was a mess and didnt work right. At that point I punted and decided to post.

whare are/is the common key(s) between the 2 files to match on?

I see that Hostname in file1 and Device in file2, refer to 2 diff hosts...
Not sure how you arrived to your desired output, unless I'm missing how to match entries between 2 files (which is probably is true)

The key field to match on is the vlan field. For awk in file1 it is $5 and in file2 it is field 1. In the last file I would append it to the end of all matching rows.

------ Post updated at 05:16 PM ------

awk -F',' 'NR==FNR{vlan[$1]=$1;name[$1]=$2;next}; ($2==vlan[$5]){print $0 "," name[$2]}' <(sort -k1 file2.csv) <(sort -k5 file1.csv) &> file3.csv

This did not work for me,

------ Post updated at 05:16 PM ------

code(

awk -F',' 'NR==FNR{vlan[$1]=$1;name[$1]=$2;next}; ($2==vlan[$5]){print $0 "," name[$2]}' <(sort -k1 file2.csv) <(sort -k5 file1.csv) &> file3.csv

)

------ Post updated at 05:17 PM ------

awk -F',' 'NR==FNR{vlan[$1]=$1;name[$1]=$2;next}; ($2==vlan[$5]){print $0 "," name[$2]}' <(sort -k1 file2.csv) <(sort -k5 file1.csv) &> file3.csv

something along these lines:

awk -F, 'FNR==NR{if (FNR==1){h=$0}f2[$1]=$2;next} FNR==1{print h, "VLAN Name";next} $5 in f2 {print $0,f2[$5]}' OFS=, file2 file1

Nope. Ultimately I am trying to print file1 to a new file with an additional column (VLAN Name), and the matching vlans have the name field appended to the end of the row. If a match is not made, then NA or a blank space is fine. This awk command appended the "vlan name" to the first line but did not return any of the fields

$ awk -F, 'FNR==NR{if (FNR==1){h=$0}f2[$1]=$2;next} FNR==1{print h, "VLAN Name";next} $5 in f2 {print $0,f2[$5]}' OFS=, file2 file1
VLAN,Name,Status,Ports,Device,VLAN Name

strange...
given your sample files:

  awk -F, 'FNR==NR{f2[$1]=$2;next} FNR==1{print $0, "VLAN Name";next} {print $0,($5 in f2)?f2[$5]:"NA"}' OFS=, file2 file1

produces...

Hostname,Port,Name,Status,VLAN,Duplex,Speed,Type,VLAN Name
switch1,gi2/1,trunk,active,1,a-100,full,10g,10GBase-CU 3M,default
switch1,gi3/1,link to data closet,disabled,2,half,a-1000,No XCVR,voice

Update, I have been making a little progress but still dont have it. This will print fields when there is a match but it isnt printing fields from file1. It will print all lines, but none of the matching info. Any help (explanation would be awesome too) is greatly appreciated!!

$  awk -F',' 'FNR==NR{a[$1]=$2;next}{if(a[$5]==""){a[$5]=0}; print $1,$2,$5,a[$1,$2]}' file1 file2

------ Post updated at 12:55 AM ------

Below is a sample of what is being returned

$  awk -F',' 'FNR==NR{a[$1]=$2;next}{if(a[$5]==""){a[$5]=0}; print $1,$2,$5,a[$1,$2]}' file1 file2
Hostname Port Vlan
onshintstat Te1/1           trunk
onshintstat Te1/2           trunk
onshintstat Te1/3           trunk
onshintstat Te1/4           trunk
onshintstat Te1/5           572
onshintstat Te1/6           572

your array a is populated with the single dimension. Why are you trying to print it indexing by 2 indecies?
Have you tried the last proposed solution from post #8?

I pulled the csv into excel and sanitized it, Below are snippets of the actual file:

FILE1 :

Hostname	Port	Name	Status	Vlan	Duplex	Speed	Type
Switch	Te1/1     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/2     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/3     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/4     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/5     	link description	"	connected    "	"	572        "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/6     	link description	"	connected    "	"	572        "	"	  full "	"	   10G "	"	10GBase-CU 3M"
Switch	Te1/7     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te1/8     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te1/9     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te1/10    	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te1/11    	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	10GBase-SR"
Switch	Te1/12    	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/1     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/2     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te2/3     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/4     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te2/5     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te2/6     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/7     	"	                   "	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	10GBase-SR"
Switch	Te2/8     	link description	"	notconnect   "	"	1          "	"	  full "	"	  auto "	"	No XCVR"
Switch	Te2/9     	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/10    	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"
Switch	Te2/11    	link description	"	connected    "	"	trunk      "	"	  full "	"	   10G "	"	10GBase-SR"

FILE2 :

VLAN	Name	Status	Ports	Device
1	default	active	Te1-7 Te1-8 Te1-10 Te1-11 Te2-2 Te2-4 Te2-5 Te2-7 Te2-8 Te3-2 Te3-3 Te3-4 Te4-2 Te4-3 Te4-4 Gi5-8 Gi5-9 Gi5-10 Gi5-11 Gi5-14 Gi5-15 Gi5-16 Gi5-18 Gi5-19 Gi5-20 Gi5-22 Gi5-23 Gi5-25 Gi5-26 Gi5-27 Gi5-28 Gi5-29 Gi5-30 Gi5-31 Gi5-32 Gi5-33 Gi5-34 Gi5-35 Gi5-36 Gi5-37 Gi5-38 Gi5-39 Gi5-40 Gi5-41 Gi5-42 Gi5-43 Gi5-44 Gi5-45 Gi5-46 Gi6-3 Gi6-4 Gi6-5 Gi6-6 Gi6-7 Gi6-8 Gi6-9 Gi6-10 Gi6-11 Gi6-12 Gi6-13 Gi6-14 Gi6-15 Gi6-16 Gi6-17 Gi6-18 Gi6-19 Gi6-20 Gi6-26 Gi6-27 Gi6-28 Gi6-30 Gi6-32 Gi6-33 Gi6-34 Gi6-35 Gi6-36 Gi6-38 Gi6-39 Gi6-40 Gi6-47 Gi7-1 Gi7-16 Gi7-17 Gi7-19 Gi7-20 Gi7-21 Gi7-23 Gi7-24 Gi7-28 Gi7-29 Gi7-30 Gi7-40	./Core.logshvlan.txt
10	stuff1	active	Gi5-12 Gi7-8 Gi7-10 Gi7-11 Gi7-12	./Core.logshvlan.txt
11	stuff2	active		./Core.logshvlan.txt
16	stuff3	active		./Core.logshvlan.txt
17	stuff4	active		./Core.logshvlan.txt
19	stuff5	active		./Core.logshvlan.txt
20	stuff6	active		./Core.logshvlan.txt
25	stuff7	active	Gi5-13 Gi5-21 Gi6-29 Gi6-44 Gi6-46 Gi6-48 Gi7-14 Gi7-15	./Core.logshvlan.txt
30	stuff8	active		./Core.logshvlan.txt
32	stuff9	active		./Core.logshvlan.txt
40	stuff10	active		./Core.logshvlan.txt
50	stuff11	active		./Core.logshvlan.txt
51	stuff12	active		./Core.logshvlan.txt
52	stuff13	active		./Core.logshvlan.txt
60	stuff14	active		./Core.logshvlan.txt
100	stuff15	active		./Core.logshvlan.txt
172	stuff16	active		./Core.logshvlan.txt

Output when I run the command:

NAonshintstat	Gi7/28    	                        	    disabled     	1	      auto 	          auto 	        10/100/1000-TX
NAonshintstat	Gi7/29    	                        	    disabled     	1	      auto 	          auto 	        10/100/1000-TX
NAonshintstat	Gi7/30    	                        	    disabled     	1	      auto 	          auto 	        10/100/1000-TX
NAonshintstat	Gi7/31    	Sanitized	    connected    	525	    a-full 	        a-1000 	        10/100/1000-TX
NAonshintstat	Gi7/32    	Sanitized	    notconnect   	525	      auto 	          auto 	        10/100/1000-TX
NAonshintstat	Gi7/33    	Sanitized	    connected    	998	    a-full 	        a-1000 	        10/100/1000-TX
NAonshintstat	Gi7/34    	Sanitized	    connected    	998	    a-full 	        a-1000 	        10/100/1000-TX
NAonshintstat	Gi7/35    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/36    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/37    	Sanitized	    connected    	505	    a-half 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/38    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/39    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/40    	                        	    disabled     	1	      auto 	          auto 	        10/100/1000-TX
NAonshintstat	Gi7/41    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/42    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/43    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/44    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/45    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/46    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/47    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX
NAonshintstat	Gi7/48    	Sanitized	    connected    	505	    a-full 	         a-100 	        10/100/1000-TX

What I am seeing is that it did not make any matches (there are matches even though the snipped of file1 does not reflect it)

------ Post updated at 01:50 PM ------

This was the output using the command that you supplied, it appended NA at the beginning of every line and made 0 matches

Did you check your file for DOS line terminators (<CR> = ^M = 0x0D)?

1 Like

And what command did you use that produced the output you showed us in post #11?

In addition to what RudiC has already said, we also have to assume that you realize that the sanitized output you showed us from excel has lost all field boundaries. So, any awk script that you might try to use with that sanitized output as an input file is hopelessly incapable of determining which text belongs in which field.

1 Like

Please see below: I removed the dos carriage returns / line feeds and ran the command provided again.

TGY9 ~/testfiles/vlanint
$ tr -d '\015' <file2 >file22

TGY9 ~/testfiles/vlanint
$ awk 'FNR==NR{f2[$1]=$5;next} FNR==1{print $0, "VLAN Name";next} {print $0,($2 in f2)?f2[$2]:"NA"}' OFS=, file2 file1
Hostname,Port,Name,Status,Vlan,Duplex,Speed,Type,VLAN Name
onshintstat,Te1/1     ,     Trunk-to-    ,    connected    ,  trunk      ,      full ,           10G ,        10GBase-CU 3M,,NA
onshintstat,Te1/2     ,     Trunk-to-    ,    connected    ,  trunk      ,      full ,           10G ,        10GBase-CU 3M,,NA
onshintstat,Te1/3     ,     Trunk-to-   ,    connected    ,  trunk      ,      full ,           10G ,        10GBase-CU 3M,,NA
onshintstat,Te1/4     ,     Trunk-to-   ,    connected    ,  trunk      ,      full ,           10G ,        10GBase-CU 3M,,NA
onshintstat,Te1/5     ,     nexu  ,    connected    ,  572        ,      full ,           10G ,        10GBase-CU 3M,,NA
onshintstat,Te1/6     ,     nexu  ,    connected    ,  572        ,      full ,           10G ,        10GBase-CU 3M,,NA

Did you do the same to file1?
Did you specified NEW/converted files on the awk's cli?
Looks like you have a trailing , in file1?

Also, please get accustomed to used code tags in your posts!

1 Like

FILE1

Hostname,Port,Name,Status,Vlan,Duplex,Speed,Type
shintstat,Te1/1 , Trunk- , connected , trunk , full , 10G , 10GBase-CU 3M,
shintstat,Te1/2 , Trunk , connected , trunk , full , 10G , 10GBase-CU 3M,
shintstat,Te1/3 , Trunk , connected , trunk , full , 10G , 10GBase-CU 3M,
shintstat,Te1/4 , Trunk , connected , trunk , full , 10G , 10GBase-CU 3M,

FILE2:

VLAN,Name,Status,Ports,Device
1,default,active,Te1-7 Te1-8 Te1-10 Te1-11 Te2-2 Te2-4 Te2-5 Te2-7 Te2-8 Te3-2 Te3-3 Te3-4 Te4-2 Te4-3 Te4-4 Gi5-8 Gi5-9 Gi5-10 Gi5-11 Gi5-14 Gi5-15 Gi5-16 Gi5-18 Gi5-19 Gi5-20 Gi5-22 Gi5-23 Gi5-25 Gi5-26 Gi5-27 Gi5-28 Gi5-29 Gi5-30 Gi5-31 Gi5-32 Gi5-33 Gi5-34 Gi5-35 Gi5-36 Gi5-37 Gi5-38 Gi5-39 Gi5-40 Gi5-41 Gi5-42 Gi5-43 Gi5-44 Gi5-45 Gi5-46 Gi6-3 Gi6-4 Gi6-5 Gi6-6 Gi6-7 Gi6-8 Gi6-9 Gi6-10 Gi6-11 Gi6-12 Gi6-13 Gi6-14 Gi6-15 Gi6-16 Gi6-17 Gi6-18 Gi6-19 Gi6-20 Gi6-26 Gi6-27 Gi6-28 Gi6-30 Gi6-32 Gi6-33 Gi6-34 Gi6-35 Gi6-36 Gi6-38 Gi6-39 Gi6-40 Gi6-47 Gi7-1 Gi7-16 Gi7-17 Gi7-19 Gi7-20 Gi7-21 Gi7-23 Gi7-24 Gi7-28 Gi7-29 Gi7-30 Gi7-40,./CoreTech.logshvlan.txt
10,Data,active,Gi5-12 Gi7-8 Gi7-10 Gi7-11 Gi7-12,./CoreTech.logshvlan.txt
11,VLAN0011,active,,./CoreTech.logshvlan.txt
16,stuff,active,,./CoreTech.logshvlan.txt
17,morestuff,active,,./CoreTech.logshvlan.txt
19,morestuff,active,,

------ Post updated at 05:52 PM ------

Sorry, new to the forum. I will respect the code tags moving forward.

Yes, I formatted both files and used them with the same command shown

PLEASE become way more stringent, disciplined when posting your requests:

  • "FILE1" is NOT the same as "file1" in IT terms (admittedly beyond MS something)

  • stick to the samples once given! your recent samples don't match the ones you gave in the first place. Nobody in here could follow you complaining on output that came from nowhere.

  • USE code tags to enhance readabilty of your code and data; don't rely on the moderators to do it for you! You may receive infractions soon if you continue to resist.

No surprise - there are NO matches! Please revisit your data.

It's hard to see without proper formatting/code tags, but I think your file1 is missing a field. Compare your recent version of file1 to the one you posted originally in post#1