Get output of multiple pattern match from first field to a file

Hi All,
Greetings!

I have a file of 40000+ lines with different entries, I need matching entries filterd out to their files based on first filed pattern for the matching :

For example:
All server1 entries (in field1) to come together with its path in 2nd field.

The best output I want to have it should generate filename for each:
Say server1.out file: which would be having "first field" and "second filed" of server1.
And so on for all the serverX .

datafile.txt  : 

server1	/usr/file1
server1 /usr/fileA
server2 /usr1/fileB
server2	/usr2/fileca
server3 /usr/DB/fileA
server3 /usr1/fileA
serverA /usr1/data1
server1 /usr3/data2
server2 /usr2/data2
server2 /path1/data2
serverA /pathb/data3
Desired output to be with each filename :


server1.out
server1	/usr/file1
server1 /usr/fileA
server1 /usr3/data2
File: server2.out 

server2 /usr1/fileB
server2 /usr2/filec
server2 /usr2/data2
File: server3.out 
server3 /usr/DB/fileA
server3 /usr1/fileA
File: serverA.out
serverA /usr1/data1
serverA /pathb/data3

Thanks ..

Try:

sort -k1,1 -k2 datafile.txt | awk '$1!=s{s=$1; close(f); f=s ".out"} {print>f}'
1 Like
#!/bin/sh
pserver=""
while read server path
do
  if [ "$server" != "$pserver" ]; then
    exec 3>>"$server".out
    pserver=$server
  fi
  echo "$server $path" >&3
done < datafile.txt

If the input file is sorted, it can become more efficient

#!/bin/sh
pserver=""
sort -k1,1 datafile.txt |
while read server path
do
  if [ "$server" != "$pserver" ]; then
    exec 3>"$server".out
    pserver=$server
  fi
  echo "$server $path" >&3
done < datafile.txt

Thank you both,
I checked the code from Scrutinizer , worked great ...
MadeInGermany also worked great ...

Both the code hard to understand .. but looks like same flow they are using , with Aws and shell..

Appreciate if you both give some explanation , how merging the lines based on first field if you get a chance ,..

Great code , thanks again .....

My second sample
works the same as Scrutinizer's sample:
sort the file on the 1st field and pipe the result to awk or a while loop. The awk automatically loops over each input line, so the code only handles the per-line action.
Detailed description follows.
In shell the while loop reads line by line; the 1st field goes to $server variable, the rest to $path variable.
If $server is different from $pserver (true in line 1), it creates a new file "$server.out" with a descriptor 3, and $server is saved in $pserver. Then the "$server $path" is written to the descriptor 3.
If $server is unchanged (equal to $pserver) it writes further "$server $path" to the descriptor 3.
If $server is different from $pserver again, it creates a new file, again using the descriptor 3 that automatically closes the old file.
In contrast, the awk code needs an explicit close(), does not show the file descriptor, and creates the file automatically with the first write.

1 Like

Thanks MadeInGermany,
It worked for a small line of code, when I executed against the 44000 lines , it did not do what it was intended to do, both the script awk and shell version, only had 1 line in the output files. not sure what happened.

Then I used shell array and grep to get the desired out put files , and working ,
Below is the code I made with grep ,

F=datafile.txt
cat $F | grep "^[a-z]" |awk '{print $1}' | sort | uniq -c| sort -rnk1 > hostlist
H=hostlist  #number of uniqe host filterd out in the file. 

#Stored each unique host names in array AA ..
typeset -i i=0 k=0 ; for j in `cat $H`; do  AA[$i]=$j ;  i=`expr $i + 1`  ; done

     ## AC=ArrayCount /Iteration needed
      AC=$(echo ${#AA[@]} ) 
...


    #AH :ArrayHost Name     #AHF: ArrayHostFile desired output filename.
    while [[ ${k} -le ${AC} ]]
    do
     
      AH=${AA[${k}]} ; AHF=${AH}.out
      grep ${AH} $F > ${AHF} ; ls -l ${AHF}    
      k=`expr $k + 1`
    done
...

It generated around 200 individual files from the big datafile . Again Many Thanks.. to both, It was kind of stuck when I started with this.., thanks..

Please get rid of the old style call of the expr program

i=`expr $i + 1`
k=`expr $i + 1`

And use the shell-builtins instead

i=$((i+1))
k=$((k+1))

or

((i+=1))
((k+=1))

And please avoid useless cat, and here you can even avoid grep

awk '/^[a-z]/ {print $1}' $F | sort ...
1 Like

In fact, the entire, long pipe

cat $F | grep "^[a-z]" |awk '{print $1}' | sort | uniq -c| sort -rnk1

could be replaced by

awk '/^[a-z]/ {C[$1]++} END {for (c in C) print C[c], c}' $F | sort -nr

noted..good points thanks.. I thought it would not work in ksh , I had an impression that "expr" and "let" are the only way to do the math operation in ksh. Thanks for the improved codes .

i=(($i+1)) 

, it works
Thank You .. MadeInGermany

ksh introduced a lot of the more advanced syntax afaik... But many primitive versions of ksh continue to be distributed, which makes it a mixed bag.