Split a file

Hi all,

A file reports.txt (see attachment) contains 17 pages of patient reports. Each patient is identified by a prefix i.e. 11 and a 7 digits number. There are total six patients reports in the file. One patient report may contain multiple pages. Following are the page count of each Lab no (seven digit number).

Lab. No:11 1713951 Page count 4
Lab. No:11 1701269 Page count 5
Lab. No:11 1394304 Page count 1
Lab. No:11 1394305 Page count 1
Lab. No:11 1394306 Page count 5
Lab. No:11 1394301 Page count 1

I am looking for an awk or perl solution to split the file according to 7 digit number. The expected file name is prefix (i.e. 11)and the 7 digit number.

111713951.txt (Should contain 4 pages)
111701269.txt (5 pages)
111394304.txt (1 page)
111394305.txt (1 page)
111394306.txt (5 pages)
111394301.txt (1 page)

So the whole 17 pages would produce 6 individual files with the 7 digits number.

Can any one of you may please give me a hand ?

Note : Sample file (reports.txt) is attached for your ref.

Regards - Sraj142

What a "page" is depends on your paper and font, so I can't tell if I have enough pages. But this splits as you ask.

nawk '{ print > "11" $3 ".txt" }' < file.txt

[edit] Okay, your actual data is nothing like the data you actually showed in your post. Working on it.

---------- Post updated at 03:08 PM ---------- Previous update was at 02:33 PM ----------

The data was so scrambled it took a while to see any patterns. I look for the "Lab." in each page and find the number after it. If no 'Lab.' is found in the page, it uses the last one it found.

awk 'BEGIN { RS="-\\*-"       }

{       for(N=1; (N<=NF)&&($N != "Lab."); N++)
        if($N == "Lab.")
        {
                N+=2;
                FILE="11" $N ".txt";
        }

        if(FILE) print > FILE;       }' < reports.txt

Hi Corona688,

Thanks a lot for giving me a hand. So far I have copied your code to a file called yy in the same directory where a copy of reports.txt is there. When I used "awk yy", its not doing anything since last 15 mins. Could you please see if I am wrong with any command ?

Regards

This is for the command line. If you can use it as a script the simplest way is to run as

sh yy

And to save output to OUTPUTFILE:

sh yy >OUTPUTFILE

You waited 15 minutes? Wow, that's patience, it ought to finish nearly instantly :stuck_out_tongue:

awk doesn't work that way. I suggest you type what I posted into an actual shell, or put it in a shell script.

Hi yazu/corona688,

As both of you suggested, I have putted the same code in a shell script and run it by sh yy, I have even try it from the command line too. This time its finished instantly but not produced anything nor even any error :()

Well, the solution doesn't work. (And my second suggestion, about output, is wrong - i was inattentive, sorry.)

Your file was produced by some text processor, not a text editor. It has a lot of special escape sequences. Is it possible to convert your file in your text processor to plain text?
If not it would be hard to give you a solution - it needs to do some binary hacking to define borders of chunks in order to split the file.