Split file into multiple files using awk

amitdaf · December 7, 2016, 6:49am

I have following file:

FHEAD0000000001RTLG20161205110959201612055019
THEAD......
TCUST.....
TITEM....
TTEND...
TTAIL...
THEAD......
TCUST.....
TITEM....
TITEM.....
TTEND...
TTAIL...
FTAIL<number of lines in file- 10 digits;prefix 0><number of lines in file-2 - 10 digits- perfix 0> Eg: FTAIL00000006420000000640

I need to split the file into multiple files such that

Each file should have 1st record FHEAD record (fixed)
Each file should have records starting from THEAD to TTAIL
Each file should have last record FTAIL with number of lines as mentioned above <number of lines in file- 10 digits;prefix 0><number of lines in file-2 - 10 digits- perfix 0> For example: FTAIL00000006420000000640
file name should configurable

Expected Output

File 1

FHEAD0000000001RTLG20161205110959201612055019
THEAD......
TCUST.....
TITEM....
TTEND...
TTAIL
FTAIL000000000700000000005

File 2

FHEAD0000000001RTLG20161205110959201612055019
THEAD......
TCUST.....
TITEM....
TITEM....
TTEND...
TTAIL...
FTAIL000000000800000000006

RudiC · December 7, 2016, 6:59am

Welcome to the forum.

This is a widespread problem. Ans attempts/ideas/thoughts from your side? Did you search these forums and/or look into the related threads at the bottom of this page, trying to adapt the solutions given?

What is "number of lines in file-" and "number of lines in file-2" ?

rbatte1 · December 7, 2016, 7:03am

Welcome amitdaf

Thanks for the question.

I have a few to questions pose in response first:-

Is this homework/assignment? There are specific forums for these.
What have you tried so far?
What output/errors do you get?
What OS and version are you using?
What are your preferred tools? (C, shell, perl, awk, etc.)
What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

You could probably use csplit based on string THEAD but you would then need to insert the FHEAD record to each file you create then calculate & append the FTAIL record.

I hope that this helps, but have a try and show us where you get stuck.

Kind regards,
Robin

amitdaf · December 8, 2016, 4:17am

Hello
I tried below code

awk '/^FHEAD/{h=$0} 
     /^THEAD/{close("RTLOG_test"f);f++}{print h >"RTLOG_test"f; print $0 >> "RTLOG_test"f}' RTLOG_5019_05122016110959.DAT

But it is not giving desired output

Number of lines in split file = number of lines in the file (wc -l)
Number of lines in split file -2 --> number if lines (wc -l minus 2)

RudiC · December 8, 2016, 5:03am

Try

awk '
/^F/            {HD = $0
                 next
                }

/^THEAD/        {if (FN)        {printf "FTAIL%010d%010d" ORS, LN+2, LN > FN
                                 close (FN)
                                 LN = 0
                                }
                 FN = "RTLOG_test" ++f
                 print HD > FN
                }

                {print >> FN
                 LN++
                }

END             {printf "FTAIL%010d%010d" ORS, LN+2, LN > FN
                }
' file
cf RTLOG_test*
RTLOG_test1:
FHEAD0000000001RTLG20161205110959201612055019
THEAD......
TCUST.....
TITEM....
TTEND...
TTAIL...
FTAIL00000000070000000005
RTLOG_test2:
FHEAD0000000001RTLG20161205110959201612055019
THEAD......
TCUST.....
TITEM....
TITEM.....
TTEND...
TTAIL...
FTAIL00000000080000000006

amitdaf · December 9, 2016, 5:29am

Hello
Thanks for the code..
In the last file, 2 FTAIL records are coming.

---------- Post updated 12-09-16 at 05:29 AM ---------- Previous update was 12-08-16 at 05:53 AM ----------

There are 2 FTAIL records in the last file.
1) FTAIL of original file
2) FTAIL computed from awk

Can we remove the original file FTAIL while printing into the file?

RudiC · December 9, 2016, 6:40am

The original FTAIL record should be ignored UNLESS the input file's structure is different from what you posted. Is it?