Matching two fields in two csv files, create new file and append match

dis0wned · September 21, 2018, 8:27am

I am trying to parse two csv files and make a match in one column then print the entire file to a new file and append an additional column that gives description from the match to the new file. If a match is not made, I would like to add "NA" to the end of the file

Command that Ive been using but has not returned any matches

awk -F, 'FNR==NR{f2[$1]=$2;next} FNR==1{print $0, "VLAN Name";next} {print $0,($5 in f2)?f2[$5]:"NA"}' OFS=, file2 file1

file1 - (Print entire file to new file, match vlan fields)

Hostname,Port,Name,Status,Vlan,Duplex,Speed,Type
shintstat,Te1/1 ,	Trunk- ,	connected ,	11 ,	full ,	10G ,	10GBase-CU 3M,
shintstat,Te1/2 ,	Trunk ,	connected ,	17 ,	full ,	10G ,	10GBase-CU 3M,
shintstat,Te1/3 ,	Trunk ,	connected ,	trunk ,	full ,	10G ,	10GBase-CU 3M,
shintstat,Te1/4 ,	Trunk ,	connected ,	trunk ,	full ,	10G ,	10GBase-CU 3M,

file2: (Match VLAN field in file2 with Vlan field in file1 then append Name field to new file / file3 with the heading Vlan Name)

VLAN,Name,Status,Ports,Device
1,default,active,Te1-7 Te1-8 Te1-10 Te1-11 Te2-2 Te2-4 Te2-5 Te2-7 Te2-8 Te3-2 Te3-3 Te3-4 Te4-2 Te4-3 Te4-4 Gi5-8 Gi5-9 Gi5-10 Gi5-11 Gi5-14 Gi5-15 Gi5-16 Gi5-18 Gi5-19 Gi5-20 Gi5-22 Gi5-23 Gi5-25 Gi5-26 Gi5-27 Gi5-28 Gi5-29 Gi5-30 Gi5-31 Gi5-32 Gi5-33 Gi5-34 Gi5-35 Gi5-36 Gi5-37 Gi5-38 Gi5-39 Gi5-40 Gi5-41 Gi5-42 Gi5-43 Gi5-44 Gi5-45 Gi5-46 Gi6-3 Gi6-4 Gi6-5 Gi6-6 Gi6-7 Gi6-8 Gi6-9 Gi6-10 Gi6-11 Gi6-12 Gi6-13 Gi6-14 Gi6-15 Gi6-16 Gi6-17 Gi6-18 Gi6-19 Gi6-20 Gi6-26 Gi6-27 Gi6-28 Gi6-30 Gi6-32 Gi6-33 Gi6-34 Gi6-35 Gi6-36 Gi6-38 Gi6-39 Gi6-40 Gi6-47 Gi7-1 Gi7-16 Gi7-17 Gi7-19 Gi7-20 Gi7-21 Gi7-23 Gi7-24 Gi7-28 Gi7-29 Gi7-30 Gi7-40,./CoreTech.logshvlan.txt
10,Data,active,Gi5-12 Gi7-8 Gi7-10 Gi7-11 Gi7-12,./CoreTech.logshvlan.txt
11,VLAN0011,active,,./CoreTech.logshvlan.txt
16,stuff,active,,./CoreTech.logshvlan.txt
17,morestuff,active,,./CoreTech.logshvlan.txt
19,morestuff,active,,

Desired output (file3)

Hostname,Port,Name,Status,Vlan,Duplex,Speed,Type,Vlan Name
shintstat,Te1/1 ,	Trunk- ,	connected ,	11 ,	full ,	10G ,	10GBase-CU 3M,
shintstat,Te1/2 ,	Trunk ,	connected ,	17 ,	full ,	10G ,	10GBase-CU 3M,

Scott · September 21, 2018, 8:43am

Having added code tags to the data as well, I see that file1 has a lot of whitespace (spaces and/or tabs) in the fields. If you're using -F to specify a very specific field separator (,), then the whitespace in those fields will become part of the fields themselves, so an exact match with fields in the other file which do not have the exact same whitespace won't work.

Assuming the logic works, a simple option would be to remove any whitespace directly before, or after a comma in file1.

e.g.

$ sed 's/[[:space:]]*,[[:space:]]*/,/g' file1 > file1.$$ && awk -F, 'FNR==NR{f2[$1]=$2;next} FNR==1{print $0, "VLAN Name";next} {print $0,($5 in f2)?f2[$5]:"NA"}' OFS=, file2 file1.$$ && rm file1.$$
Hostname,Port,Name,Status,Vlan,Duplex,Speed,Type,VLAN Name
shintstat,Te1/1,Trunk-,connected,11,full,10G,10GBase-CU 3M,,VLAN0011
shintstat,Te1/2,Trunk,connected,17,full,10G,10GBase-CU 3M,,morestuff
shintstat,Te1/3,Trunk,connected,trunk,full,10G,10GBase-CU 3M,,NA
shintstat,Te1/4,Trunk,connected,trunk,full,10G,10GBase-CU 3M,,NA

RudiC · September 21, 2018, 8:58am

Don't double post. What's new compared to your other thread https://www.unix.com/shell-programming-and-scripting/279906-comparing-two-columns-two-files-printing-third-based-match.html except the two (new?) input files? Your (new?) desired output doesn't seem to satisfy your specification, does it?

EDIT: adding

{gsub (/[ 	]*,[ 	]*/, ",")}

(note the <TAB> chars!) as the first statement in your awk script might help accomplish your task.

dis0wned · September 21, 2018, 9:26am

RudiC, sorry. The other post was all over the board and I had a chance to look at this again this morning. I though it was just cleaner to start over. I apologize, and thanks again for your help

------ Post updated at 01:26 PM ------

This works perfect however I am unable to pipe the output to a new file. What am I missing? In my limited experience with SED & AWK, I never had an issue with piping ouput

Scott · September 21, 2018, 9:30am

The redirection is a shell thing, not an awk or sed thing. Put it after the awk command, regardless of which solution you use.

e.g.

awk ...... file2 file1 > file3

or

sed .. && awk ... file2 file1 > file3 && rm ...

or, when using multiple commands, you can wrap the whole thing in parenthesis, and redirect that, warts and all. e.g.

(command1 && command2 && command3) > file3

dis0wned · September 21, 2018, 9:35am

Perfect, thank you for explaining!! Can you explain the parts of the command? I want to understand the structure, Ive been struggling with the awk command structure

vgersh99 · September 21, 2018, 9:41am

or all in "trimming" awk:
awk -f dis.awk file2 file1
where dis.awk is:

BEGIN {
  FS=OFS=","
  tab=sprintf("\t")
}
function trim(str)
{
    sub("^([ ]*|" tab "*)", "", str)
    sub("([ ]*|" tab "*)" "$", "", str)
    return str
}

FNR==NR { f2[trim($1)]=trim($2);next }
FNR==1   { print $0, "VLAN Name";next}
{ $5=trim($5);print $0,($5 in f2)?f2[$5]:"NA" }