Hello community, I am getting a log files from system and I need to clean the data and store as txt files for reporting purposes. Since these files are generated in Unix box, so we have to write shell script to handle the data cleansing.
Please find the sample file data looks like:
InsertTime:201604070523 DocID:101
#headers
'DocID: 101 MOVEABLE TOOLS: 2 QTY: 0 HELD TOOLS: 0 QTY: 0 BLOCKED TOOLS: 0 QTY: 0'
#columns 'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aaaaa' '1' '1' 'Slow' '8gkahinka.01'
'aaaaa' '1' '0' 'Slow' '7nlafnjbaflnbja.01'
#blocked '' ''
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070523 DocID:102
#headers
'DocID: 102 MOVEABLE TOOLS: 2 QTY: 0 HELD TOOLS: 0 QTY: 0 BLOCKED TOOLS: 0 QTY: 0'
#columns 'TargetDoc' 'Rank' 'Check Name' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aa' '1' 'xyz' '8gkahinka.01'
'aax' '1' 'none' '7nlafnjbaflnbja.01'
#blocked '' ''
#rule 'Rule_Axf'
#doc '102'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070750 DocID:101
#headers
'DocID: 101 MOVEABLE TOOLS: 2 QTY: 0 HELD TOOLS: 0 QTY: 0 BLOCKED TOOLS: 0 QTY: 0'
#columns 'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'xxxx' '1' '1' 'Slow' 'bjkkacka.01'
'yyyy' '1' '0' 'Slow' 'jiafjklas.001'
#blocked '' ''
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#ISS-DLIS-DIAGS
This was a raw data and I need to write a shell script to cleanse the data.
- row started with # is like comment and we need to ignore that other than #coulmns
- #columns are give the columns names and #rows give the actual data.
- unwanted data highlighted with red color and useful data highlighted as black color
- The header for out put file is always all the #headers in the data along with InsertTime and DocID
- assign the values as per header and add InsertTime & DocID values too.
- data delimiter is | in the out put file.
Please find the desired out put:
InsertTime|DocID|TargetDoc|GRank|LRank|Priority|Loc ID|Rank|Check Name
201604070523|101|aaaaa|1|1|Slow|8gkahinka.01||
201604070523|101|aaaaa|1|0|Slow|7nlafnjbaflnbja.01||
201604070523|102|aa||||8gkahinka.01|1|xyz
201604070523|102|aax||||7nlafnjbaflnbja.01|1|none
201604070750|101|xxxx|1|1|Slow|bjkkacka.01||
201604070750|101|yyyy|1|0|Slow|jiafjklas.001||