Field Seperator: `~
The concept is to start with SNA in the 3rd Field.
Fetch the 4th field, search that 4th field content in the
third field of some other line , save the 1st field, get that line's 4th field and so on till
I reach DNA in the 4th field.
I have already developed part of the script which does that. But I am not sure how to handle, when I search for 4th field content in 3rd field, if it has more than one occurance.
I am getting the below output:
Why don't you post your script that solves part of the problem, and why don't you specify correctly and in detail? e.g. the 32 at the start of each solution comes from where? SNA, I know, but don't leave us guessing. Also, you should clearly state that ALL four solutions are needed, i.e. duplicates in $3 are equivalent.
complinagetest () #function name
{
if [ -f complins.dat ];then
rm complins.dat
fi
touch complins.dat
i=0
while read line
do
if [ $line == "SNA" ]; then
va=`grep -w "$line" datalins1.dat | awk BEGIN'{FS="\`~"}{if ( $3=="'$line'" ) {print $4}}'`
i=$(($i+1))
varits=$(echo $va|awk -v varif="$i" '{print $varif}')
if [ "$varits" = "DNA" ]; then
grep -w "$varits" datalins1.dat | awk BEGIN'{FS="`~"}{if ( $3=="'$line'" ) {print $1}}'|sed 's/$/~>/' >> complins.dat
else
grep -w "$varits" datalins1.dat | awk BEGIN'{FS="`~"}{if ( $3=="'$line'" ) {print $1}}'| tr '\n' '~' | sed 's/~/~>/g' >> complins.dat
while [ "$varits" != "DNA" ]
do
while read line
do
if [ $line == "$varits" ] ; then
varits=`grep -w "$line" datalins1.dat | awk BEGIN'{FS="\`~"}{if ( $3=='$line' ) {print $4}}'`
check=`echo $varits|awk '{print NF}'`
if [ "$check" == "1" ];then
echo "inside if"
grep -w "$varits" datalins1.dat | awk BEGIN'{FS="`~"}{if ( $3=='$line' ) {print $1}}'| tr '\n' '~'| sed 's/~/~>/' >> complins.dat
else
echo "problem occurs here in else case when $check > 1(ie. $4 from datalins1.dat has two number for same value of $1)"
fi
fi
done < source.dat
done
fi
echo "" >> complins.dat
fi
done < source.dat
}
complinagetest #calling function
--------------------------------------------------------------------------
input files for above function are datalins1.dat and source.dat
output files for above function is complins.dat
-----------------------------------------------------------------------------
Field Seperator: `~
The concept is to start with SNA in the 3rd Field.
Fetch the 4th field, search that 4th field content in the
third field of some other line , save the 1st field, get that line's 4th field and so on till
I reach DNA in the 4th field.
I have already developed part of the script which does that. But I am not sure how to handle, when I search for 4th field content in 3rd field, if it has more than one occurance.
------------------------------------------------------------------
Expected output for the above example:
-------------------------------------------------------------------
the function works perfectly if we have single number in $4 of datalins1.dat for same value in $1
Could some one plz help me with this logic approach with unix script.
yes the file (datalins1.dat) could be much larger......not necessary only 12 lines(for the above example only 12 lines).....but the format of the file would be always constant ( four fields separated by delimiter `~ )...
What do you consider "much larger"? A hundred lines? A thousand? A million? A billion? A simple solution may scale from 12 to 1000, but perhaps not to a million and beyond.
It would be a shame for someone to waste their time crafting code that can never be used because it takes forever to complete or because it requires more memory than the system has available. So, please, be more precise than "much larger". Also, if the file can approach the size of your system's memory, you should definitely mention that.
Sorry for the inconvenience......much larger doesn't meant here that it can go to thousand or million lines...the above code is crafted on the basis of similar kind(datalins1.dat) of input scenario...
Check this link as well as this one
What you are trying to do is relating to tree walk (chained list) have a look at the algorithm used to build leaf path.
awk -F'`~' '
function from(n,pre,i,v,x)
{
if(n in T) {
v=split(T[n],x,",")
for(i=1; i<=v;i++)
from(F[x], pre "~" L[x]);
} else print substr(pre,2);
}
{
L[NR]=$1;
F[NR]=$4;
if($3 in T) T[$3]=T[$3]","NR;
else T[$3]=NR;
}
END { from("SNA") }' infile
---------- Post updated at 12:44 PM ---------- Previous update was at 10:57 AM ----------
This can also be done with bash, however as associative arrays aren't supported I used a dummy number (9999) for SNA. Also bash will have tighter memory constraints so it will fail on smaller files than the awk solution:
#!/bin/bash
function from()
{
local pre=$2
if [ ${#T[$1]} -gt 0 ]
then
set ${T[$1]}
while [ $# -gt 0 ]
do
from "${F[$1]}" "$pre~${L[$1]}"
shift
done
else
echo ${pre:1}
fi
}
while read line
do
set ${line//\`~/ }
((L[++ln]=$1))
((F[ln]=$4))
[ "$3" = "SNA" ] && T[9999]="${T[9999]} $ln" || T[$3]="${T[$3]} $ln"
done < infile
from 9999