Script for splitting file of records into multiple files

Hello I have a file of following format

HDR 1234 abc qwerty
abc def ghi jkl

HDR 4567 xyz qwerty
abc def ghi jkl

HDR 890 mno qwerty
abc def ghi jkl

HDR 1234 abc qwerty
abc def ghi jkl

HDR 1234 abc qwerty
abc def ghi jkl

-Need to split this into multiple files based on tag HDR

  • the file names need to be 1234.txt 4567.txt 890.txt 1234-1.txt 1234-2.txt
    A script is helpful using AWK, but open to any suggestions. With Awk I could split based on the tag, but unable to write the script to create multiple files with the data as the file name.

where are you stuck exactly?

awk -v RS="" '{
        if(F) close(F);

        N[$2]++

        if(N[$2] > 1) F=sprintf("%s-%d.txt",$2,N[$2]-1);
        else    F=$2 ".txt";

        print > F;
}' data
1 Like

I've generalized it a bit - where the counting starts with -1 :wink:

awk '{if (out) close(out); out=$2 "-" ++f[$2] ".txt"; print $0 >out}' RS= myFile
2 Likes

Can you please explain the logic. Where do you have the tag "HDR" as part of the delimiter?

The HDR value is the second field - $2

1 Like

One more question. if I need the output of each file in one line without the tag "HDR" what do I need to do?
For example the output needs to be

1234 abc qwertyabc def ghi jkl
awk '{if (out) close(out); out=$2 "-" ++f[$2] ".txt"; sub("^[^"FS"]*"FS,""); gsub(ORS,"");print $0 >out}' RS= myFile
1 Like

Worked like charm. Can you pls explain the expression?