Get output of multiple pattern match from first field to a file

rveri · January 26, 2017, 12:27pm

Hi All,
Greetings!

I have a file of 40000+ lines with different entries, I need matching entries filterd out to their files based on first filed pattern for the matching :

For example:
All server1 entries (in field1) to come together with its path in 2nd field.

The best output I want to have it should generate filename for each:
Say server1.out file: which would be having "first field" and "second filed" of server1.
And so on for all the serverX .

datafile.txt  : 

server1	/usr/file1
server1 /usr/fileA
server2 /usr1/fileB
server2	/usr2/fileca
server3 /usr/DB/fileA
server3 /usr1/fileA
serverA /usr1/data1
server1 /usr3/data2
server2 /usr2/data2
server2 /path1/data2
serverA /pathb/data3

Desired output to be with each filename :


server1.out
server1	/usr/file1
server1 /usr/fileA
server1 /usr3/data2

File: server2.out 

server2 /usr1/fileB
server2 /usr2/filec
server2 /usr2/data2

File: server3.out 
server3 /usr/DB/fileA
server3 /usr1/fileA

File: serverA.out
serverA /usr1/data1
serverA /pathb/data3

Thanks ..

Scrutinizer · January 26, 2017, 12:43pm

Try:

sort -k1,1 -k2 datafile.txt | awk '$1!=s{s=$1; close(f); f=s ".out"} {print>f}'

MadeInGermany · January 26, 2017, 12:59pm

#!/bin/sh
pserver=""
while read server path
do
  if [ "$server" != "$pserver" ]; then
    exec 3>>"$server".out
    pserver=$server
  fi
  echo "$server $path" >&3
done < datafile.txt

If the input file is sorted, it can become more efficient

#!/bin/sh
pserver=""
sort -k1,1 datafile.txt |
while read server path
do
  if [ "$server" != "$pserver" ]; then
    exec 3>"$server".out
    pserver=$server
  fi
  echo "$server $path" >&3
done < datafile.txt

rveri · January 27, 2017, 1:41am

Thank you both,
I checked the code from Scrutinizer , worked great ...
MadeInGermany also worked great ...

Both the code hard to understand .. but looks like same flow they are using , with Aws and shell..

Appreciate if you both give some explanation , how merging the lines based on first field if you get a chance ,..

Great code , thanks again .....

MadeInGermany · January 27, 2017, 12:26pm

My second sample
works the same as Scrutinizer's sample:
sort the file on the 1st field and pipe the result to awk or a while loop. The awk automatically loops over each input line, so the code only handles the per-line action.
Detailed description follows.
In shell the while loop reads line by line; the 1st field goes to $server variable, the rest to $path variable.
If $server is different from $pserver (true in line 1), it creates a new file "$server.out" with a descriptor 3, and $server is saved in $pserver. Then the "$server $path" is written to the descriptor 3.
If $server is unchanged (equal to $pserver) it writes further "$server $path" to the descriptor 3.
If $server is different from $pserver again, it creates a new file, again using the descriptor 3 that automatically closes the old file.
In contrast, the awk code needs an explicit close(), does not show the file descriptor, and creates the file automatically with the first write.

rveri · January 31, 2017, 6:31pm

Thanks MadeInGermany,
It worked for a small line of code, when I executed against the 44000 lines , it did not do what it was intended to do, both the script awk and shell version, only had 1 line in the output files. not sure what happened.

Then I used shell array and grep to get the desired out put files , and working ,
Below is the code I made with grep ,

F=datafile.txt
cat $F | grep "^[a-z]" |awk '{print $1}' | sort | uniq -c| sort -rnk1 > hostlist
H=hostlist  #number of uniqe host filterd out in the file. 

#Stored each unique host names in array AA ..
typeset -i i=0 k=0 ; for j in `cat $H`; do  AA[$i]=$j ;  i=`expr $i + 1`  ; done

     ## AC=ArrayCount /Iteration needed
      AC=$(echo ${#AA[@]} ) 
...


    #AH :ArrayHost Name     #AHF: ArrayHostFile desired output filename.
    while [[ ${k} -le ${AC} ]]
    do
     
      AH=${AA[${k}]} ; AHF=${AH}.out
      grep ${AH} $F > ${AHF} ; ls -l ${AHF}    
      k=`expr $k + 1`
    done
...

It generated around 200 individual files from the big datafile . Again Many Thanks.. to both, It was kind of stuck when I started with this.., thanks..

MadeInGermany · February 1, 2017, 3:58am

Please get rid of the old style call of the expr program

i=`expr $i + 1`
k=`expr $i + 1`

And use the shell-builtins instead

i=$((i+1))
k=$((k+1))

or

((i+=1))
((k+=1))

And please avoid useless cat, and here you can even avoid grep

awk '/^[a-z]/ {print $1}' $F | sort ...

RudiC · February 1, 2017, 4:26am

In fact, the entire, long pipe

cat $F | grep "^[a-z]" |awk '{print $1}' | sort | uniq -c| sort -rnk1

could be replaced by

awk '/^[a-z]/ {C[$1]++} END {for (c in C) print C[c], c}' $F | sort -nr

rveri · February 1, 2017, 10:43am

noted..good points thanks.. I thought it would not work in ksh , I had an impression that "expr" and "let" are the only way to do the math operation in ksh. Thanks for the improved codes .

i=(($i+1))

, it works
Thank You .. MadeInGermany

Corona688 · February 1, 2017, 10:46am

ksh introduced a lot of the more advanced syntax afaik... But many primitive versions of ksh continue to be distributed, which makes it a mixed bag.