Strip header and footer

samrat_dutta · June 14, 2013, 5:27pm

Hi I have below requirements on the script below :
(1) I receive 2 pipe seperated file called OUT.psv and DIFF.psv with a column header.I concatenate the 2 files and create a final.psv file. I want to add another header as START_FILE to the final.psv file . How to achieve this ?

(2) I have added 3 footers using echo in the script successfully.

(3)Next day i run the script again which needs to first strip the first header which is START_FILE and last 3 headers and do the concatenate of a OUT and DIFF file and then sort . How to remove the first header and last 3 headers and the continue with my usual script to concatenate and sort ?

#!/bin/ksh
cat /app/reporting/daily/OUT.psv /app/reporting/daily/temp/DIFF.psv >> /app/reporting/daily/final.psv
sort -u -t'|' -k1,1r /app/reporting/daily/final.psv
COUNT=$( wc -l) < /app/reporting/daily/final.psv
echo "END_OF_FILE" >> /app/reporting/daily/final.psv
echo "Time_Finished"$NOW >> /app/reporting/daily/final.psv
echo "RECORD_COUNT"$COUNT >> /app/reporting/daily/final.psv
rm /app/reporting/daily/OUT.psv 
rm /app/reporting/daily/temp/DIFF.psv

Output:
-------

START_OF_FILE
time|tourit|nofdays|rbcid|blank|type|value|nill|valuedesc|name
2013-05-16T00:52:31.662-04:00|12|3|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO ABLatest
2013-05-15T00:52:31.672-04:00|1|40|39693766-NA-NA||Common Stick|ESHR||Common Stock|HS AG
2013-05-14T00:52:31.672-04:00|1|45|16111278-TSX-NA||Common Stk|ESQHR||Common Stock|STANDARD REGISTER CO
2013-05-14T00:52:31.662-04:00|1|4|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO AB
2013-05-13T00:52:31.672-04:00|1|44|15277105-NA-NA||Common Stock|DOMESHR||Common Stock|CYRO AB
2013-05-10T00:52:31.672-04:00|1|5|39693766-NYSE-EUR||Common HSAGStick|ESHR||Common Stock|HS AG
END_OF_FILE
Time_Finished=Fri Jun 14 16:41:09 EDT 2013
RECORD_COUNT= 7

spacebar · June 14, 2013, 5:54pm

This is one way, You can create the output file with just the header you want to add and then concatenate the other files into it:

echo START_OF_FILE > /app/reporting/daily/final.psv
cat /app/reporting/daily/OUT.psv /app/reporting/daily/temp/DIFF.psv >> /app/reporting/daily/final.psv

You can use sed or grep to retreive only the 'data' lines from the file for processing:

$ cat t
START_OF_FILE
time|tourit|nofdays|rbcid|blank|type|value|nill|valuedesc|name
2013-05-16T00:52:31.662-04:00|12|3|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO ABLatest
2013-05-15T00:52:31.672-04:00|1|40|39693766-NA-NA||Common Stick|ESHR||Common Stock|HS AG
2013-05-14T00:52:31.672-04:00|1|45|16111278-TSX-NA||Common Stk|ESQHR||Common Stock|STANDARD REGISTER CO
2013-05-14T00:52:31.662-04:00|1|4|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO AB
2013-05-13T00:52:31.672-04:00|1|44|15277105-NA-NA||Common Stock|DOMESHR||Common Stock|CYRO AB
2013-05-10T00:52:31.672-04:00|1|5|39693766-NYSE-EUR||Common HSAGStick|ESHR||Common Stock|HS AG
END_OF_FILE
Time_Finished=Fri Jun 14 16:41:09 EDT 2013
RECORD_COUNT= 7


$ sed -n '/^[0-9]/p' t
2013-05-16T00:52:31.662-04:00|12|3|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO ABLatest
2013-05-15T00:52:31.672-04:00|1|40|39693766-NA-NA||Common Stick|ESHR||Common Stock|HS AG
2013-05-14T00:52:31.672-04:00|1|45|16111278-TSX-NA||Common Stk|ESQHR||Common Stock|STANDARD REGISTER CO
2013-05-14T00:52:31.662-04:00|1|4|15277105-NA-YEN||Common Stick|ESHR||Common Stock|CYRO AB
2013-05-13T00:52:31.672-04:00|1|44|15277105-NA-NA||Common Stock|DOMESHR||Common Stock|CYRO AB
2013-05-10T00:52:31.672-04:00|1|5|39693766-NYSE-EUR||Common HSAGStick|ESHR||Common Stock|HS AG

samrat_dutta · June 15, 2013, 3:26pm

HI thanks for input but just to let you know the number of columns may be added more in future . So in your statement below is it only existing 10 columns you counted? .

sed -n '/^[0-9]/p' t

Don_Cragun · June 15, 2013, 7:18pm

samrat dutta:

HI thanks for input but just to let you know the number of columns may be added more in future . So in your statement below is it only existing 10 columns you counted? .
sed -n '/^[0-9]/p' t

This sed command does not care how many columns are in its input file; it prints lines from its input if and only if the first character on the line is a digit. Since all of the header and trailer lines in your input file start with an alphabetic character, this sed command just discards headers and trailers from the input file and prints all of the non-header and non-trailer lines.

samrat_dutta · June 17, 2013, 11:30am

Thanks for clarification. Just want to clarify one more thing. If i have to write the output of this sed to a temp file, using > or >> gives same result. Which one to use > or >> ?

sed -n '/^[0-9]/p' final.psv > temp.psv

OR

sed -n '/^[0-9]/p' final.psv >> temp.psv

spacebar · June 17, 2013, 1:08pm

">" overwrites
">>" appends

---------- Post updated at 12:08 ---------- Previous update was at 12:06 ----------

Additional info: I/O Redirection