awk Parse And Create Multiple Files Based on Field Value

ec012 · February 12, 2015, 2:13pm

Hello:

I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE.
The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the $STORENUM value from the second field. Each store report end with a 0x0C which I am using as and end pattern. Perhaps there is a better way...

Here is my code and sample input file:

 awk  '$1 ~ /DATE/{f="Report_"++i} f{print > f"_"} /"\0x0C"/ {close (f); f=""}' $INPUT_FILE

INPUT_FILE Example:

      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     1           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     1     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

     DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     2           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     2     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

RudiC · February 12, 2015, 3:20pm

Unless there are several headers per store with "DATE" in it, you can drop the 0x0C test and close (f) just before defining a new file name.

MadeInGermany · February 12, 2015, 3:26pm

Few adjustments.

awk '$1 ~ /DATE/ {if (i++) close (f); f="Report_" i "_"} {print > f}' $INPUT_FILE

Open files are properly closed when awk is finished. But an early close() prevents from running out of file descriptors.

ec012 · February 12, 2015, 3:35pm

Thanks you! Any way to incorporate the STORE number (field $2) in the file name for each file ?

RudiC · February 12, 2015, 4:54pm

This works if the respective headers are identical (which is NOT the case for your sample! I had to edit it):

awk     'NR==1  {HD=$0}
         $0==HD {getline; FN="Report_" $2; print HD > FN}
                {print $0 > FN}
        ' file
Report_1:
      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     1           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     1     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

Report_2:
      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     2           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     2     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

ec012 · February 12, 2015, 5:19pm

RuiC:

Thanks so much! It certainly works. Now I have something to work with!!!

MadeInGermany · February 13, 2015, 4:10am

The following handles different headers and empty sections, and closes files.

awk '($1=="DATE") {hd=$0; next} (length(hd)) {if (f) close (f); f="Report_" $2 "_"; print hd > f; hd=""} {print > f}' $INPUT_FILE

.

ec012 · February 13, 2015, 9:41am

This is exactly what I was struggling to accomplish! Now I can pad field $2 to be 4 decimal places and I'm done. Thanks to MadeInGermany and RudiC for the great responses!