I have been stuck in this requirement where my file contains the below format.
20150812170500846959990854-25383-8.0.0
"ABC Report" hp96880
"4952"
20150812170501846959990854-25383-8.0.0 End of run
20150812060132846959990854-20495-8.0.0
"XYZ Report" vg76452
"1006962188"
20150812060141846959990854-20495-8.0.0
"ZZY Report" fu59172
20150812060147846959990854-20495-8.0.0 End of run
It follows the below pattern.
Line 1: Start Time
Line 2: Report Name and User
Line 3: Identifier
Line 4: End Time
In the following lines, the 2nd block is missing the End Time and the 3rd block is missing the Identifier.
The requirement is to
convert all lines starting with "20" into date format i.e. YYYY/MM/DD
Merge block from Start Time till End time separated by commas.
Ignore blocks that that don't have the end time.
Add a blank space in the block which doesn't contain identifier.
If possible, separate Report Name and User Name with comma.
I used the if loop for addressing the requirements but the script slows down when run for large files and hence I'm looking for a faster solution using sed or awk.
Can anyone please help me out here ?
It's quite difficult to sync in on those records with elements missing and fields consisting of several words. So the above is far from elegant and may benefit from some polishing...
That helped to a major extent.
I'm looking into additional cases where there are multiple identifiers in the file and would attempt tweaking the code. I'll ask your help if I fail.