Read multiple lines at a time from file

Hello All,

I have a file like below....

dn: cn=user1,ou=org,o=org
cn=user1
uid=user1

cn=user2,ou=org,o=org
cn=user2
uid=user2

cn=user3,ou=org,o=org
cn=user3
cn=user33
uid=user3

cn=user4,ou=org,o=org
cn=user4
uid=user4

I want to read lines until first blank line in first attempt put those lines into seperate variable and then do something and then read line from first empty line to second empty line, put those lines into seperate variable and then do something...etc. I want to this to the end of the file. It could be 3-5 lines. How can I do this. Please assist. Thanks

A tricky question indeed. It depends on what you are then going to go on and do, but if you are on AIX, there is a -p flag on grep to get you a paragraph. I'm not sure how many variants have it available through.

If the file is not large, you could:-

#!/bin/ksh
while read line
do
   if [ "$line" = "" ]
   then
      <handle new section>
   else
      <continue processing section>
   fi
done < file

It will be slow for larger files, but as an alternate you could split the file into multiple smaller ones based on the blank line, then process them in turn:-

#!/bin/ksh

mkdir $$
cd $$
typeset -Z12 i=0

linecnt=`grep -c file`
((linecnt=$linecnt+1))
csplit -n 6 ../file "/^$/" {$linecnt}

Then, loop through the files generated. You may need to consider the length of the file name counter ( -n 6 in my example) and the red double dots are only if the file is in the current directory. You may have to adjust that to suit too.

You can then (because you are in a new directory with only your data in it) run a loop to process the files as you wish. I would recommend against a for file in * construct if you expect a lot of files as this may exceed the command line limit. You have the counter, so probably best to use that:-

typeset -Z12 file_num=0
while [ $file_num -le $linecnt ]
do
   ((file_num=$file_num+1))
   ....whatever....
done

If you are looking for just certain patterns, you could get a list of file names with a grep from this point and work through them too.

I hope that this helps.

Robin
Liverpool/Blackburn
UK

The first version seems the most sensible and efficient, really.

try also:

#!/bin/ksh
set +A arr
c1=0
awk '$1=$1' RS="" infile | while read rec
do
  arr[$c1]="$rec"
  (( c1 = c1 + 1 ))
done
c2=0
while [ $c2 -lt $c1 ]
do
   echo "${arr[$c2]}"
   (( c2 = c2 + 1 ))
done
$ awk -v RS='\n\n' -v ORS='\n\n' ' { print NR":" $0 } ' file
1:dn: cn=user1,ou=org,o=org
cn=user1
uid=user1

2:cn=user2,ou=org,o=org
cn=user2
uid=user2

3:cn=user3,ou=org,o=org
cn=user3
cn=user33
uid=user3

4:cn=user4,ou=org,o=org
cn=user4
uid=user4

@anbu: you can do this more reliably, using RS= instead of RS='\n\n' . Also, standard awk can only use a single character for RS .

2 Likes

Thanks All for your replies...
I started putting something together from first version of the code from "rbatte1" and it worked except that i had few issues with the loop and code that i put in to do something but all those resolved now and it took about 35mins to finish up some compares with 155k users file. By the way, I'm on RHEL and using bash script. Anyways thanks all for your support. Really helped.

1 Like