Awk Array doesnt match for substring

Awk Array doesnt match for substring

nawk -F"," 'FNR==NR{a[$1]=$2 OFS $3;next} a[$2]{print $1,$2,a[$2]}' OFS="," file1 file2

I want cluster3 in file1 to match with cluster3int in file2
output getting:

Output required:

Help is appreciated

where's 'cluster3int' in file1??

'cluster3int' is not in file1 but i want the code to match with substring.

cluster3 exists in file2 which is substring of cluster3int from file1 .
I want to match cluster3 from file2 with 'cluster3int' which exists in file1

Appreciate help

Thanks

Also, 'cluster2' exists in file2. And 'cluster2' is a 'substring' of 'cluster2' and 'xyz.cluster2' from file1.
What's the algorithm?

basically the file1 will have incomplete names on the trailing side from entries in file2.

So if
xyz.cluster2 in file2
can match only with

xyz.clust
or
xyz.cluste
or
xyz.clus

---------------------------------------------------------
'xyz.cluster2' in file2 ,
cluster2 in file1 has no "xyz" in leading side so its a different entry.

Hope this clarifies the requirement

nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a;next}}' OFS="," file1 file2

Now... could you please explain it how it works - step by step with code comments.

Thanks vgersh99
Here is the explanation

nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a;next}}' OFS="," file1 file2
FNR==NR

Checking for first file i.e file1

a["^" $1]=$2 OFS $3;next}

Creating array for first field of file1 with "^" appended to it
so array will be like
a[^cluster1]
a[^cluster2]
a[^cluster3]

next

the remaining code after next statement is not executed when awk is processing first file file1

{for (i in a) if ($2 ~ i) { print $1,$2,a;next}}
for (i in a) 

looping through each element of array using awk special for loop

if ($2 ~ i)

checking for second field of file2 in each array index.
if found
print $1,$2,a[i];next

print $1,$2,a

prints column 1 and 2 from file2 and array content

next

this is not needed.But doesnt hurt as well.

OFS=","

output field seperator is comma

Very well - you seem to understand the solution.
Good luck in the future.

This may work.

awk -F, 'BEGIN {i=0
while ((getline < "file2") > 0)
{i++
file2=$0 }
imax=i
}
{
  for (i in file2)
  {
   if ($1 == substr(file2,5,length($1)))
   {
     file2loc=sprintf("%s,%s", $2, $3)
   }
  }
}
END {
  for (i=1;i<=imax;i++)
      print file2 "," file2loc
}'

file1