Split a file

sraj142 · August 23, 2011, 3:52am

Hi all,

A file reports.txt (see attachment) contains 17 pages of patient reports. Each patient is identified by a prefix i.e. 11 and a 7 digits number. There are total six patients reports in the file. One patient report may contain multiple pages. Following are the page count of each Lab no (seven digit number).

Lab. No:11 1713951 Page count 4
Lab. No:11 1701269 Page count 5
Lab. No:11 1394304 Page count 1
Lab. No:11 1394305 Page count 1
Lab. No:11 1394306 Page count 5
Lab. No:11 1394301 Page count 1

I am looking for an awk or perl solution to split the file according to 7 digit number. The expected file name is prefix (i.e. 11)and the 7 digit number.

111713951.txt (Should contain 4 pages)
111701269.txt (5 pages)
111394304.txt (1 page)
111394305.txt (1 page)
111394306.txt (5 pages)
111394301.txt (1 page)

So the whole 17 pages would produce 6 individual files with the 7 digits number.

Can any one of you may please give me a hand ?

Note : Sample file (reports.txt) is attached for your ref.

Regards - Sraj142

Corona688 · August 23, 2011, 5:08pm

What a "page" is depends on your paper and font, so I can't tell if I have enough pages. But this splits as you ask.

nawk '{ print > "11" $3 ".txt" }' < file.txt

[edit] Okay, your actual data is nothing like the data you actually showed in your post. Working on it.

---------- Post updated at 03:08 PM ---------- Previous update was at 02:33 PM ----------

The data was so scrambled it took a while to see any patterns. I look for the "Lab." in each page and find the number after it. If no 'Lab.' is found in the page, it uses the last one it found.

awk 'BEGIN { RS="-\\*-"       }

{       for(N=1; (N<=NF)&&($N != "Lab."); N++)
        if($N == "Lab.")
        {
                N+=2;
                FILE="11" $N ".txt";
        }

        if(FILE) print > FILE;       }' < reports.txt

sraj142 · August 24, 2011, 1:01am

Hi Corona688,

Thanks a lot for giving me a hand. So far I have copied your code to a file called yy in the same directory where a copy of reports.txt is there. When I used "awk yy", its not doing anything since last 15 mins. Could you please see if I am wrong with any command ?

Regards

yazu · August 24, 2011, 1:22am

This is for the command line. If you can use it as a script the simplest way is to run as

sh yy

And to save output to OUTPUTFILE:

sh yy >OUTPUTFILE

Corona688 · August 24, 2011, 9:34am

You waited 15 minutes? Wow, that's patience, it ought to finish nearly instantly

awk doesn't work that way. I suggest you type what I posted into an actual shell, or put it in a shell script.

sraj142 · August 25, 2011, 2:37am

Hi yazu/corona688,

As both of you suggested, I have putted the same code in a shell script and run it by sh yy, I have even try it from the command line too. This time its finished instantly but not produced anything nor even any error :()

yazu · August 25, 2011, 3:30am

Well, the solution doesn't work. (And my second suggestion, about output, is wrong - i was inattentive, sorry.)

Your file was produced by some text processor, not a text editor. It has a lot of special escape sequences. Is it possible to convert your file in your text processor to plain text?
If not it would be hard to give you a solution - it needs to do some binary hacking to define borders of chunks in order to split the file.