******************************
Class 1A
Students absent are :
1. ABC
2. CDE
3. CPE
******************************
Class 2A
Students absent are :
******************************
Class 3A
Students absent are :
******************************
Class 17ACF
Students absent are :
1. ABCD
2. XYZ
From this file i just need to fetch/extract the data where ever there is some value for Students absent
Class name is dynamic and no of absent students are also dynamic
E.g. Output should look like
******************************
Class 1A
Students absent are :
1. ABC
2. CDE
3. CPE
******************************
Class 17ACF
Students absent are :
1. ABCD
2. XYZ
Pls help how could i do it via simple command or a script.
Hi reldb,
I can quickly give you an algorithm to this. Just convert it into unix code and make use of grep command for searching patterns.
create two temporary files file1.txt and file2.txt
scount=0
while read k
do
if [ line starts with Class ]
then
put the line into a file1.txt
elif [ line starts with Students ]
then
append the line into file1.txt
elif [ line starts with a number ]
scount+=1
append the line into file1.txt
elif [line is empty and scount >=1 ]
then
insert an empty line into file1.txt
insert ****** into file1.txt
append file1.txt data to file2.txt
empty file1.txt
scount=0
elif [ line is empty and scount=0 ]
then
empty the file1.txt
fi
done <"Sourcefile.txt"
The above could be used assuming the structure of your source file remains the same as you have provided.
You should have shown us what your attempts were. Anyhow, try
awk '/\*\*\*/ {if (CNT>4) for (i=1;i<=CNT;i++) print T; CNT=0}
{T[++CNT]=$0}
END {if (CNT>4) for (i=1;i<=CNT;i++) print T}
' file
******************************
Class 1A
Students absent are :
1. ABC
2. CDE
3. CPE
******************************
Class 17ACF
Students absent are :
1. ABCD
2. XYZ
EDIT: This was nice but it didn't quite satisfy your spec:
awk'(A=gsub (/\n/, "&"))>4||A==0' RS="*" ORS="*" file
******************************
Class 1A
Students absent are :
1. ABC
2. CDE
3. CPE
****************************************************************************************
Class 17ACF
Students absent are :
1. ABCD
2. XYZ
*
DHeisenberg - Thanks for suggestion. I wrote a program on similar patter in java and it is working perfectly fine.
RudiC - Thanks for your suggestion, Below one worked fine. (got some error in 2nd suggestion with live data)
awk '/\*\*\*/ {if (CNT>4) for (i=1;i<=CNT;i++) print T; CNT=0}
{T[++CNT]=$0}
END {if (CNT>4) for (i=1;i<=CNT;i++) print T}
' file
I have couple of question to understand it better and use it for other future requirement as well.
/\*\*\*/ is extracting the paragraph based on *** pattern and then based on number of line/rows count result is getting printed.
Instead of counting the number of lines if i want to check in this paragraph if any line starts with a number(.) then print it (kind of true or false logic) then how to do
I couldn't understand the logic of print 2 times (one before end and other after end with similar logic) even though final output is only once.
Because it prints only at the *** lines, and your example does not end with it, you need another print at the end, otherwise your last section is never printed.