Problem to read archive

Dear all,

I have this archive: cat file.txt

archive test  02  sequence 03        02length     52
archive test  02  sequence 04        02length     52
archive test  02  sequence 05        02length     52
teste arquivo 06 sequencia 08        06  length     54
teste arquivo 06 sequencia 09        06  length     54
teste arquivo 08 sequencia 01        08                                    length     88
teste arquivo 09 sequencia 01        09                           length     79

I have this shell....
----------------------------------

#!/Bin/ksh
dir_work=/aplic/tmp
arquivo=file.txt
cd ${dir_work}
cat $arquivo | while read registro
do
   campo=`echo $registro | cut -c38-39`
   echo $registro >>${campo}_${arquivo} 
done

-----------------------------------
I have a problem when I try to separate in 4 archives (Types 02, 06, 08 and 09).
I hope this results:

02_file.txt with 3 records
06_file.txt with 2 records
08_file.txt with 1 record
09_file.txt with 1 record.

Help me please.

One problem with your script, is that it should be bin, not Bin. Also you have to use double quotes around $registro to maintain appropriate spacing.

#!/bin/ksh
dir_work=/aplic/tmp
arquivo=file.txt
cd ${dir_work}
cat $arquivo | while read registro
do
   campo=`echo "$registro" | cut -c38-39`
   echo $registro >>${campo}_${arquivo} 
done

If you want to retain the original spacing of the input file in the output files you'd also have to use double quotes:

echo "$registro" >>${campo}_${arquivo} 

You can easily check what your script is doing by temporarily using

#!/bin/ksh -x

An alternative way to do it would be:

#!/bin/ksh
dir_work=/aplic/tmp
arquivo=file.txt
cd ${dir_work}
while read registro; do
   campo=${registro:37:2}
   echo $registro >>${campo}_${arquivo}
done<$arquivo

Thanks.... It's solved part of my problem...
I have a header and trailer.....
Header

                                     00                                   02112009

Trailer

                                     99     00000061

The shell included header in Type 02 (didn't consider the spaces on the left side) and I lost the Trailer.....
The Trailer contains total amount of register type.

$ cat file.txt |awk '{print $3}'  |sort |uniq -c |sort  -k2 |awk '{print $2"_file.txt with",$1, "records"}'
02_file.txt with 3 records
06_file.txt with 2 records
08_file.txt with 1 records
09_file.txt with 1 records
$ awk '{a[$3]+=1} END {for (i in a) print i"_file.txt with",a,"records" }' file.txt  |sort
02_file.txt with 3 records
06_file.txt with 2 records
08_file.txt with 1 records
09_file.txt with 1 records

@cewa67: Try using IFS= read -r instead of read

while IFS= read -r registro
grep . *_*
00_file.txt:00 02112009
02_file.txt:archive test 02 sequence 03 02length 52
02_file.txt:archive test 02 sequence 04 02length 52
02_file.txt:archive test 02 sequence 05 02length 52
06_file.txt:teste arquivo 06 sequencia 08 06 length 54
06_file.txt:teste arquivo 06 sequencia 09 06 length 54
08_file.txt:teste arquivo 08 sequencia 01 08 length 88
09_file.txt:teste arquivo 09 sequencia 01 09 length 79
99_file.txt:99 00000061

If I put double quotes around $registro when writing to the output files, to preserve original spacing I get:

00_file.txt:                                     00                                   02112009
02_file.txt:archive test  02  sequence 03        02length     52
02_file.txt:archive test  02  sequence 04        02length     52
02_file.txt:archive test  02  sequence 05        02length     52
06_file.txt:teste arquivo 06 sequencia 08        06  length     54
06_file.txt:teste arquivo 06 sequencia 09        06  length     54
08_file.txt:teste arquivo 08 sequencia 01        08                                    length     88
09_file.txt:teste arquivo 09 sequencia 01        09                           length     79
99_file.txt:                                     99     00000061

I use IFS= read -r instead of read and I get the results.. Thanks.
But I have other problem... the performance...
Processed 94.000 records in 3 minutes but I have 93.000.000 records.
How I solve this?

Then why didn't you say so? I thought you specifically wanted a shell script:rolleyes:. Try this. :slight_smile:

awk '{print>substr($0,38,2)"_file.txt"}' file.txt

You can make it even faster if you use mawk.

Great Solution.... and very fast
The archive has many names of input (city names)
I want to generate the output file with the same name from input.
for example:
1� input archive = saopaulo.txt
outputs: 02_saopaulo.txt / 03_saopaulo.txt / etc...

2� input archive = riodejaneiro.txt
outputs: 02_riodejaneiro.txt / 03_riodejaneiro.txt / etc..

Thanks

Good to hear.. This should do the trick if you put all those 'city'.txt files in one directory:

awk '{print>substr($0,38,2)"_"FILENAME}' !([0-9]*).txt