Script for splitting file of records into multiple files

wincrazy · October 2, 2018, 2:19pm

Hello I have a file of following format

HDR 1234 abc qwerty
abc def ghi jkl

HDR 4567 xyz qwerty
abc def ghi jkl

HDR 890 mno qwerty
abc def ghi jkl

HDR 1234 abc qwerty
abc def ghi jkl

HDR 1234 abc qwerty
abc def ghi jkl

-Need to split this into multiple files based on tag HDR

the file names need to be 1234.txt 4567.txt 890.txt 1234-1.txt 1234-2.txt
A script is helpful using AWK, but open to any suggestions. With Awk I could split based on the tag, but unable to write the script to create multiple files with the data as the file name.

vgersh99 · October 2, 2018, 2:24pm

where are you stuck exactly?

Corona688 · October 2, 2018, 2:26pm

awk -v RS="" '{
        if(F) close(F);

        N[$2]++

        if(N[$2] > 1) F=sprintf("%s-%d.txt",$2,N[$2]-1);
        else    F=$2 ".txt";

        print > F;
}' data

vgersh99 · October 2, 2018, 2:33pm

I've generalized it a bit - where the counting starts with -1

awk '{if (out) close(out); out=$2 "-" ++f[$2] ".txt"; print $0 >out}' RS= myFile

wincrazy · October 16, 2018, 11:51am

Can you please explain the logic. Where do you have the tag "HDR" as part of the delimiter?

vgersh99 · October 16, 2018, 12:06pm

The HDR value is the second field - $2

wincrazy · October 16, 2018, 5:39pm

One more question. if I need the output of each file in one line without the tag "HDR" what do I need to do?
For example the output needs to be

1234 abc qwertyabc def ghi jkl

vgersh99 · October 16, 2018, 5:55pm

awk '{if (out) close(out); out=$2 "-" ++f[$2] ".txt"; sub("^[^"FS"]*"FS,""); gsub(ORS,"");print $0 >out}' RS= myFile

wincrazy · October 17, 2018, 12:14am

Worked like charm. Can you pls explain the expression?