i'm trying to take each line in file1 and using the original text to match, if no match, iterate backwards one character at time, until it matches first column in file2, loop through all of file2 and print all matching lines where text and any substring matches the first column of file2 to another file. output file3 essentially will have concatenated output of original text from file1 and matching lines from file2
Apart from showing us your attempts at this problem, could you also indicate if the order of records in the output file is important?
The problem description seems to indicate that a direct match record, when available, should be listed first and then other matches should be displayed in the same order as they appear in file1.txt.
However if the order of records in the output file is unimportant, the solution can be simplified a fair bit.
#!/bin/sh
# read from f1, print in this order
while read f1line
do
# read from f2, find matches => print
while IFS="," read f2col1 f2othercols
do
case $f1line in
("$f2col1"*)
echo "$f1line,$f2col1,$f2othercols"
esac
done < file2.txt
done < file1.txt
The same idea in awk (file2 is read into an array variable first):
#!/bin/sh
awk -F"," '
{
if (NR==FNR) {
# read from f2 into associative array col1[]
col1[$1]=($2 FS $3)
} else {
# read from f1, find matches => print
for (c in col1)
if (c == substr($0,1,length(c)))
print $0 FS c FS col1[c]
}
}
' file2.txt file1.txt
In awk your propsed way can be implemented with no big overhead:
#!/bin/sh
awk -F"," '
{
if (NR==FNR) {
# read from f2 into associative array col1[]
col1[$1]=($2 FS $3)
} else {
# read from f1, find matches => print
for (i=length; i>=1; i--)
if ((c=substr($0,1,i)) in col1)
print $0 FS c FS col1[c]
}
}
' file2.txt file1.txt
The if (t ~ $1) is a RE match that is "fuzzy" unless it is anchored.
Should be if (t ~ ("^" $1)) ; the ^ anchor means the string $1 must occur at the beginning of string t.