Split file using awk

I need to split the incoming source file in to multiple files using awk.

Split position is (6,13) : 8 positions

  1. All the records that are greater than 20170101 and less than or equal to 20181231 should go in a split file with file name as source filename_greaterthan_20170101_lessthan_20181231 + yyyymmddhhmmss
  • All records that are less than 20170101 should go in a file with file name as source filename_lessthan_20170101 + yyyymmddhhmmss
  • All records that are greater than 20190101 should go in a file with file name as source filename_20190101 + yyyymmddhhmmss

Additionally instead of hard coding the condition in the script/command, can we pass it as a variable to the script , so the script remains dynamic.

Source File:

001  20991231
002  20190101
003  20231231
004  20231231
005  20261231
006  20271231
007  20281231
008  20301231
009  20161231
010  20161230
011  20161010
012  19880101
013  20000101
014  20110121
015  20130121
016  20170121
017  19870121

Try and adjust something like:

awk '{y=substr($2,1,4); f=b} y<lt{f=a} y>gt{f=c} {print>f} ' lt=2017 gt=2018 a=y1 b=y2 c=y3 infile

This should split the input file into the files y1, y2 and y3.

1 Like

Thanks..

Can you please explain the code in few lines

Sure:

awk '                                            
  {                                            
    y=substr($2,1,4)                              # Set the variable y to first 4 characters of 
                                                  # the second field of the input file                        
    f=b                                           # set the output to the name in variable b
  }                                            
  y<lt {                                          # if the year is less than the min treshold  
    f=a                                           # set the variable f to the name in variable a
  }                                             
  y>gt {                                          # if the year is more than the max treshold 
    f=c                                           # set the variable f to the name in variable c
  }                                            
  {                                            
    print>f                                       # print the line to the appropriate file          
  }                                            
' lt=2017 gt=2018 a=y1 b=y2 c=y3 infile           # set variables lt, gt, a, b, and c and specify file name.

Thanks.

Is there a way to not hard code 2017 and 2018 , rather pass them as a parameters ?

they are being passed as parameters.

I think s/he means shell variables

awk '...' lt="${PAR1}" gt="${PAR2}"

Correct...How to run the AWK inside a shell script by passing parameters

Should be clear now, no?

Thanks.

I adjusted the script as below

#!/usr/bin/ksh
PAR1=$1
PAR2=$2
PAR3=$3

awk '{y=substr($2,1,4); f=b} y<lt{f=a} y>gt{f=c} {print>f} ' lt="${PAR1}" gt="${PAR2}" a=y1 b=y2 c=y3 $PAR3

Few more adjustments I need to make.

  1. How to append input filename portion to the resulting files y1,y2,y3

    Example : Input FileName = abc_123~xyz

    Desired Output FileName = abc_y1_123~xyz
  1. At this point I do not whether it would be coulum 2 that I have to check , instead of including column 2 can I just go by position on the substr ?

    Example :

    text awk '{y=substr($0,1,4); f=b} y<lt{f=a} y>gt{f=c} {print>f} ' lt="${PAR1}" gt="${PAR2}" a=y1 b=y2 c=y3 $PAR3

Please advise on the above...Thanks

a) appending / prefixing the actual file name to the output file name would be way easier than inserting into the yn string. Try f=a FILENAME etc. If not happy with this, construct the f variable with a few substr() calls...

b) feel free to adjust the selection criteria to whatever you desire, but note that your above idea would not yield identical results, as $2 starts at char position 6 in your sample.

1 Like

Thanks