Awk Array doesnt match for substring
nawk -F"," 'FNR==NR{a[$1]=$2 OFS $3;next} a[$2]{print $1,$2,a[$2]}' OFS="," file1 file2
I want cluster3 in file1 to match with cluster3int in file2
output getting:
Output required:
Help is appreciated
Awk Array doesnt match for substring
nawk -F"," 'FNR==NR{a[$1]=$2 OFS $3;next} a[$2]{print $1,$2,a[$2]}' OFS="," file1 file2
I want cluster3 in file1 to match with cluster3int in file2
output getting:
Output required:
Help is appreciated
where's 'cluster3int' in file1??
'cluster3int' is not in file1 but i want the code to match with substring.
cluster3 exists in file2 which is substring of cluster3int from file1 .
I want to match cluster3 from file2 with 'cluster3int' which exists in file1
Appreciate help
Thanks
Also, 'cluster2' exists in file2. And 'cluster2' is a 'substring' of 'cluster2' and 'xyz.cluster2' from file1.
What's the algorithm?
basically the file1 will have incomplete names on the trailing side from entries in file2.
So if
xyz.cluster2 in file2
can match only with
xyz.clust
or
xyz.cluste
or
xyz.clus
---------------------------------------------------------
'xyz.cluster2' in file2 ,
cluster2 in file1 has no "xyz" in leading side so its a different entry.
Hope this clarifies the requirement
nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a;next}}' OFS="," file1 file2
Now... could you please explain it how it works - step by step with code comments.
Thanks vgersh99
Here is the explanation
nawk -F"," 'FNR==NR{a["^" $1]=$2 OFS $3;next} {for (i in a) if ($2 ~ i) { print $1,$2,a;next}}' OFS="," file1 file2
FNR==NR
Checking for first file i.e file1
a["^" $1]=$2 OFS $3;next}
Creating array for first field of file1 with "^" appended to it
so array will be like
a[^cluster1]
a[^cluster2]
a[^cluster3]
next
the remaining code after next statement is not executed when awk is processing first file file1
{for (i in a) if ($2 ~ i) { print $1,$2,a;next}}
for (i in a)
looping through each element of array using awk special for loop
if ($2 ~ i)
checking for second field of file2 in each array index.
if found
print $1,$2,a[i];next
print $1,$2,a
prints column 1 and 2 from file2 and array content
next
this is not needed.But doesnt hurt as well.
OFS=","
output field seperator is comma
Very well - you seem to understand the solution.
Good luck in the future.
This may work.
awk -F, 'BEGIN {i=0
while ((getline < "file2") > 0)
{i++
file2=$0 }
imax=i
}
{
for (i in file2)
{
if ($1 == substr(file2,5,length($1)))
{
file2loc=sprintf("%s,%s", $2, $3)
}
}
}
END {
for (i=1;i<=imax;i++)
print file2 "," file2loc
}'
file1