Splitting the file based on two fields - Fixed length file

saj · May 15, 2019, 11:58am

Hi ,
I am having a scenario where I need to split the file based on two field values. The file is a fixed length file.
ex:

AA0998703000000000000190510095350019500010005101980301      
K 0998703000000000000190510095351019500020005101480         
CC0338703368396368396190510114449019600010005101980301      
L 03387033683963683961905101144500196000200051012803        
I O553203000000000000190510120433019700010005101980301

Split based on 4th position to 4 char (9987) and 21st to 6 char(190510)

So, in the above example there will be 3 files generated.

9987_190510.txt
3387_190510.txt
5532_190510.txt

I tried with the below command,

awk '{ F=substr($0,4,4)".txt"; print $0 >> F; close(F) }' filename

But it splits only on the first set, I need to do it with the combination of both sub strings.

nezabudka · May 15, 2019, 12:09pm

Hi, try so

awk '
/^.{3}9987.{14}190510/  {print >"file1"}
/^.{3}3387.{14}190510/  {print >"file2"}
/^.{3}5532.{14}190510/  {print >"file3"}
' file

vgersh99 · May 15, 2019, 12:15pm

awk '{ F=substr($0,4,4) "_" substr($0,21,6) ".txt"; print $0 >> F; close(F) }' filename

nezabudka · May 15, 2019, 12:20pm

awk '{print > gensub(/^.{3}(.{4}).{14}(.{6}).*/, "\\1_\\2", 1)".txt"}'

MadeInGermany · May 15, 2019, 1:25pm

With bash builtins:

#!/bin/bash
outfile="" prefile=""
while IFS= read line
do
  outfile="${line:3:4}_${line:21:6}.txt"
  if [ "$outfile" != "$prefile" ]
  then
    exec >>"$outfile"
    prefile=$outfile
  fi
  echo "$line"
done < filename

It tries to reduce the number of open() calls.
The same is achieved in awk by minimizing the close()

awk '{ F=substr($0,4,4) "_" substr($0,22,6) ".txt"; print $0 >> F } F!=PF { close(PF); PF=F }' filename

If there are not too many output files, do not close (and re-open) them at all:

awk '{ F=substr($0,4,4) "_" substr($0,22,6) ".txt"; print $0 > F }' filename

It allows overwriting the files because there is only one open() per file (awk does it automatically at the first write).