I have a file of 40000+ lines with different entries, I need matching entries filterd out to their files based on first filed pattern for the matching :
For example:
All server1 entries (in field1) to come together with its path in 2nd field.
The best output I want to have it should generate filename for each:
Say server1.out file: which would be having "first field" and "second filed" of server1.
And so on for all the serverX .
#!/bin/sh
pserver=""
while read server path
do
if [ "$server" != "$pserver" ]; then
exec 3>>"$server".out
pserver=$server
fi
echo "$server $path" >&3
done < datafile.txt
If the input file is sorted, it can become more efficient
#!/bin/sh
pserver=""
sort -k1,1 datafile.txt |
while read server path
do
if [ "$server" != "$pserver" ]; then
exec 3>"$server".out
pserver=$server
fi
echo "$server $path" >&3
done < datafile.txt
My second sample
works the same as Scrutinizer's sample:
sort the file on the 1st field and pipe the result to awk or a while loop. The awk automatically loops over each input line, so the code only handles the per-line action.
Detailed description follows.
In shell the while loop reads line by line; the 1st field goes to $server variable, the rest to $path variable.
If $server is different from $pserver (true in line 1), it creates a new file "$server.out" with a descriptor 3, and $server is saved in $pserver. Then the "$server $path" is written to the descriptor 3.
If $server is unchanged (equal to $pserver) it writes further "$server $path" to the descriptor 3.
If $server is different from $pserver again, it creates a new file, again using the descriptor 3 that automatically closes the old file.
In contrast, the awk code needs an explicit close(), does not show the file descriptor, and creates the file automatically with the first write.
Thanks MadeInGermany,
It worked for a small line of code, when I executed against the 44000 lines , it did not do what it was intended to do, both the script awk and shell version, only had 1 line in the output files. not sure what happened.
Then I used shell array and grep to get the desired out put files , and working ,
Below is the code I made with grep ,
F=datafile.txt
cat $F | grep "^[a-z]" |awk '{print $1}' | sort | uniq -c| sort -rnk1 > hostlist
H=hostlist #number of uniqe host filterd out in the file.
#Stored each unique host names in array AA ..
typeset -i i=0 k=0 ; for j in `cat $H`; do AA[$i]=$j ; i=`expr $i + 1` ; done
## AC=ArrayCount /Iteration needed
AC=$(echo ${#AA[@]} )
...
#AH :ArrayHost Name #AHF: ArrayHostFile desired output filename.
while [[ ${k} -le ${AC} ]]
do
AH=${AA[${k}]} ; AHF=${AH}.out
grep ${AH} $F > ${AHF} ; ls -l ${AHF}
k=`expr $k + 1`
done
...
It generated around 200 individual files from the big datafile . Again Many Thanks.. to both, It was kind of stuck when I started with this.., thanks..
noted..good points thanks.. I thought it would not work in ksh , I had an impression that "expr" and "let" are the only way to do the math operation in ksh. Thanks for the improved codes .