Text Splitter

Hi,

I need to split files based on text:

BEGIN DSJOB
   Identifier "LA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
........
BEGIN DSJOB
   Identifier "NA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
........
........
..........
 

My output should be:
LA.txt

BEGIN DSJOB
   Identifier "LA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB

NA.txt

BEGIN DSJOB
   Identifier "NA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB

and so...on based BEGIN DSJOB & END DSJOB..

Thanks in advance

awk '($0 ~ /BEGIN DSJOB/){x=$0;getline;y=$2;gsub("\"","",y);f=1;print x > y".txt";}(f==1){if($0 ~ /END DSJOB/){f=0;print > y".txt"}else{print > y".txt"}}' file_name

Another approach:

awk -F\" '/BEGIN DSJOB/{s=$0;getline;f=$(NF-1) ".txt";print s > f}{print > f} /END DSJOB/{close(f)}' file

Will work if you make the {print >> f} append instead of overwrite.

Nope. An excerption of the GNU Awk User's Guide:

When the statement print > f is run, the file referred to by the expression f will be clobbered the first time and the file will remain open until the end of the awk program or until the file is explicitly closed by close statement. All statements writing to this file during that period will append to it.

print >> f is similar but the file will be opened in append mode.

we need to give $(NF-1)

awk -F\" '/BEGIN DSJOB/{s=$0;getline;f=$(NF-1)".txt";print s > f}{print > f} /END DSJOB/{close(f)}' test_temp

Right, thanks!

Strange enough. I do confirm the citations from the awk man page. But, executing Franklin52's contruct (with pamu's addition), LA.txt will contain 2 empty lines, while NA.txt is correctly written to. Using two >, both files are OK.
awk version: mawk 1.3.3

Thanks all...

Strange :confused:, never have any problem with it.

Hi Rudic,

After using two >, both files are getting created but it also has some unwanted data. such as ......
Solution provided by raj_saini20 in post 2 works perfectly here...

$ awk -F\" '/BEGIN DSJOB/{s=$0;getline;f=$(NF-1)".txt";print s > f}{print >> f} /END DSJOB/{close(f)}' test_temp
$ cat LA.txt
BEGIN DSJOB
   Identifier "LA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
........
$ cat NA.txt
BEGIN DSJOB
   Identifier "NA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
........
........
..........

Guess am late for the show...but just as food for the thought...

awk 'BEGIN{RS="END DSJOB";} {x=substr($4,2,2)".txt";print $0 >> x}' file_name

Only one file gets created with this and missing last line "END DSJOB".....

Am sorry Pamu..
1> For missing line "END DSJOB" , i had missed something here [ ORS ];
2> For me both the files are getting generated...

Input file : test2

BEGIN DSJOB
   Identifier "LA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
BEGIN DSJOB
   Identifier "NA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB

The revised code :

awk 'BEGIN{RS="END DSJOB";ORS=RS;} {x=substr($4,2,2)".txt";print $0 >> x}'

As per your input file it works perfect.....:slight_smile:

try with the input provided at the start...

input file

BEGIN DSJOB
   Identifier "LA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
dfdfds
........
BEGIN DSJOB
   Identifier "NA"
   DateModified "2011-10-28"
   TimeModified "11.10.02"
   BEGIN DSRECORD
      Identifier "ROOT"
      BEGIN DSSUBRECORD
         Owner "APT"
         Name "RecordJobPerformanceData"
         Value "0"
      END DSSUBRECORD
   END DSRECORD
END DSJOB
........
........
sdfds
........
..........

@Pamu : you are correct..i overlooked that thinking its just repetitive of DSJOB section...thank u