ec012
February 12, 2015, 2:13pm
1
Hello:
I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE.
The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the $STORENUM value from the second field. Each store report end with a 0x0C which I am using as and end pattern. Perhaps there is a better way...
Here is my code and sample input file:
awk '$1 ~ /DATE/{f="Report_"++i} f{print > f"_"} /"\0x0C"/ {close (f); f=""}' $INPUT_FILE
INPUT_FILE Example:
DATE STORE *-UPC NUMBER-* TIME TERM TRANS OPERATOR RETAIL RAIN CHECK SAVINGS QTY DESCRIPTION
2015-01-01 1 1234 00.24.03 1 11 $1.00 $1.00 $1.00 1 VARIETY ITEM 6PK
2015-01-01 1 9999019919 00.20.19 1 11 $1.00 $1.00 $1.50 1 WATER SOFT 1PK
DATE STORE *-UPC NUMBER-* TIME TERM TRANS OPERATOR RETAIL RAIN CHECK SAVINGS QTY DESCRIPTION
2015-01-01 2 1234 00.24.03 1 11 $1.00 $1.00 $1.00 1 VARIETY ITEM 6PK
2015-01-01 2 9999019919 00.20.19 1 11 $1.00 $1.00 $1.50 1 WATER SOFT 1PK
RudiC
February 12, 2015, 3:20pm
2
Unless there are several headers per store with "DATE" in it, you can drop the 0x0C
test and close (f)
just before defining a new file name.
Few adjustments.
awk '$1 ~ /DATE/ {if (i++) close (f); f="Report_" i "_"} {print > f}' $INPUT_FILE
Open files are properly closed when awk is finished. But an early close() prevents from running out of file descriptors.
ec012
February 12, 2015, 3:35pm
4
Thanks you! Any way to incorporate the STORE number (field $2) in the file name for each file ?
RudiC
February 12, 2015, 4:54pm
5
This works if the respective headers are identical (which is NOT the case for your sample! I had to edit it):
awk 'NR==1 {HD=$0}
$0==HD {getline; FN="Report_" $2; print HD > FN}
{print $0 > FN}
' file
Report_1:
DATE STORE *-UPC NUMBER-* TIME TERM TRANS OPERATOR RETAIL RAIN CHECK SAVINGS QTY DESCRIPTION
2015-01-01 1 1234 00.24.03 1 11 $1.00 $1.00 $1.00 1 VARIETY ITEM 6PK
2015-01-01 1 9999019919 00.20.19 1 11 $1.00 $1.00 $1.50 1 WATER SOFT 1PK
Report_2:
DATE STORE *-UPC NUMBER-* TIME TERM TRANS OPERATOR RETAIL RAIN CHECK SAVINGS QTY DESCRIPTION
2015-01-01 2 1234 00.24.03 1 11 $1.00 $1.00 $1.00 1 VARIETY ITEM 6PK
2015-01-01 2 9999019919 00.20.19 1 11 $1.00 $1.00 $1.50 1 WATER SOFT 1PK
1 Like
ec012
February 12, 2015, 5:19pm
6
RuiC:
Thanks so much! It certainly works. Now I have something to work with!!!
The following handles different headers and empty sections, and closes files.
awk '($1=="DATE") {hd=$0; next} (length(hd)) {if (f) close (f); f="Report_" $2 "_"; print hd > f; hd=""} {print > f}' $INPUT_FILE
.
1 Like
ec012
February 13, 2015, 9:41am
8
This is exactly what I was struggling to accomplish! Now I can pad field $2 to be 4 decimal places and I'm done. Thanks to MadeInGermany and RudiC for the great responses!