awk Parse And Create Multiple Files Based on Field Value

Hello:

I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE.
The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the $STORENUM value from the second field. Each store report end with a 0x0C which I am using as and end pattern. Perhaps there is a better way...

Here is my code and sample input file:

 awk  '$1 ~ /DATE/{f="Report_"++i} f{print > f"_"} /"\0x0C"/ {close (f); f=""}' $INPUT_FILE
 

INPUT_FILE Example:

      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     1           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     1     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

     DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     2           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     2     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

Unless there are several headers per store with "DATE" in it, you can drop the 0x0C test and close (f) just before defining a new file name.

Few adjustments.

awk '$1 ~ /DATE/ {if (i++) close (f); f="Report_" i "_"} {print > f}' $INPUT_FILE

Open files are properly closed when awk is finished. But an early close() prevents from running out of file descriptors.

Thanks you! Any way to incorporate the STORE number (field $2) in the file name for each file ?

This works if the respective headers are identical (which is NOT the case for your sample! I had to edit it):

awk     'NR==1  {HD=$0}
         $0==HD {getline; FN="Report_" $2; print HD > FN}
                {print $0 > FN}
        ' file
Report_1:
      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     1           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     1     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK

Report_2:
      DATE   STORE  *-UPC NUMBER-*  TIME     TERM TRANS OPERATOR      RETAIL  RAIN CHECK   SAVINGS   QTY        DESCRIPTION
 2015-01-01     2           1234  00.24.03  1    11                   $1.00      $1.00      $1.00     1   VARIETY ITEM 6PK
 2015-01-01     2     9999019919  00.20.19  1    11                   $1.00      $1.00      $1.50     1   WATER SOFT 1PK
1 Like

RuiC:

Thanks so much! It certainly works. Now I have something to work with!!!

The following handles different headers and empty sections, and closes files.

awk '($1=="DATE") {hd=$0; next} (length(hd)) {if (f) close (f); f="Report_" $2 "_"; print hd > f; hd=""} {print > f}' $INPUT_FILE

.

1 Like

This is exactly what I was struggling to accomplish! Now I can pad field $2 to be 4 decimal places and I'm done. Thanks to MadeInGermany and RudiC for the great responses! :b: