How to get lines started with matched strings using sed or grep for loop?

AMBER · June 27, 2009, 11:58pm

I have a huge file and want to separate it into several subsets.
The file looks like:

C1 C2 C3 C4 ... (variable names)
1 ....
2 ....
3 ....
:
22 ....
23 ....

I want to separate the huge file using the column 1, which has numbers from 1 to 23 (but there are different amount of lines for each number).
I tried to use the loop looks like this:

@ i = 1
while ($i <= 23)
grep ^$i hugefile.txt > $i.txt
@ i ++
end

But there is one problem that for 1.txt, it will include all lines started with 1, 10, 11,..19

I tried to use
grep "\<$i\>" instead, but it doesn't work too. Since in the other columns might contain numbers 1 to 23, and will add into the wrong files.

I am thinking maybe I can awk all the columns if column 1 = $i?

Is there any suggestion for this situation? Thank you in advance:b:.

tkleczek · June 28, 2009, 4:37am

while read first rest; do
    echo "$first $rest" >> $first.txt
done

Side effect: It changes all whitespace between first and second field into space.

kshji · June 28, 2009, 4:48am

Previous ex. using read is better, but if you like to use grep then you must grep value+space. If you use grep to find lines using some key, use always with field delimeter. This works using ksh, bash, posix-sh, ...

i=1
while ((i <= 23))
do
  grep "^$i " hugefile.txt > $i.txt
 ((i+=1))
end

Franklin52 · June 28, 2009, 6:17am

Try this:

awk '{print > $1".txt"}' file

Regards

summer_cherry · June 29, 2009, 5:54am

nawk '{file=sprintf("%s.txt",$1);print $0 >> file}' yourfile

AMBER · June 29, 2009, 5:04pm

Thank you so much! All posts are really helpful, it works using read, grep, awk and nwak~ The 'grep value+space' is really smart, it solved so many problems~ Thank you guys again

pxgupta2k · July 13, 2009, 3:56am

I have somewhat similar problem.
Hello,

I have a file with junk characters at the beginning of the file. Pls advice how could i bypass these junk characters. Sample file is given below.I want to extract the data starting from CREATE keyword to till end of the file.

SQL08029QDB2/AIX6400AIX 64BIT@MADMS012009-07-13-00.05.04.732394PSQLASC_ANNUALIZEDRATEOFRETURNACTIVITY"SQL080226170053405MADMS01I MADMS01LDID074"SYSIBM","SYSFUN","SYSPROC","LDID074"%P5270293MADMS01SPACE RESERVED FOR FUTURE�PVM 00SQL08029QDB2/AIX64��SQL090513152703000
�� FH��$&G�k��K��CREATE PROCEDURE"MADMS01I"."ASC_ANNUALIZEDRATEOFRETURNACTIVITY"
(IN "GUIDPOLICYGUID" VARCHAR(36),
IN "DPERIODSTARTDATE" TIME
END

panyam · July 13, 2009, 5:03am

awk '/CREATE/,/END/' file.txt

thanhdat · July 13, 2009, 5:26am

Hope this help:

awk 'BEGIN {FS="CREATE"; RS=""} {print "CREATE" $2}'  yourfile.txt