awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values.
This is what my data looks like

1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
1ku1A02 1 10 1000 DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE
1b9gA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1bqtA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1efeA00 1 10 100 FVNQHLCGSHLVEALYLVCGERGFFYTPKTRRYPGDVKRGIVEQCCTSICSLYQLENYCN
1eakA01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV
1eakB01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV

This is what I'd like the output to look like
1.10.1000.txt

1bc9A02 1 10 1000  FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
1ku1A02 1 10 1000  DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE

1.10.100.txt

1b9gA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1bqtA00 1 10 100 GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
1efeA00 1 10 100 FVNQHLCGSHLVEALYLVCGERGFFYTPKTRRYPGDVKRGIVEQCCTSICSLYQLENYCN

1.10.101.txt

1eakA01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV
1eakB01 1 10 101 TDKELAVQYLNTFYGCPKESCNLFVLKDTLKKMQKFFGLPQTGDLDQNTIETMRKPRCGNPDV

Columns 2, 3, and 4 all vary, so I need to split it based on all three values. I know how to do this using awk and close for one column, but I don't know how to extend it to three columns. Thank you so much in advance!!!

One way:

$ awk '{print > $2"."$3"."$4".txt" }' file

After running the above awk command:

$ ls 1.10*
1.10.100.txt  1.10.1000.txt  1.10.101.txt

Guru.

1 Like

This just creates a bunch of empty text files. I'd like those text files to include the information from the original file as indicated in the original post. Thanks.

Its does create files with content. Which OS you are using?

Guru.

I'm using a UNIX terminal. The code doesn't do what is requested.

It will be much better if you can show us what exactly you did, what output you got in code tags rather than just simply saying "the code doesn't do what is requested"

Guru's code should work fine, I don't see any issues in it!

But I would also recommend to close the file, because if there are too many files opened, eventually awk may exceed a system limit on the number of open files in one process.

It is best to close each one when the program has finished writing it.

awk '{F=$2"."$3"."$4".txt";print >> F;close(F)}' inputfile
1 Like

Yoda's code works, Thanks!!

---------- Post updated at 07:00 PM ---------- Previous update was at 10:57 AM ----------

What if, instead of wanting to output the entire line, I wanted to output just the last column in the text files, but with with the same file names? so
1.10.1000.txt

FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
DFSGLRVDEAIRILLTKFRLPGESQQIERIIEAFSSAYCENQDYDPSKISDNAEDDISTVQPDADSVFILSYSIIMLNTDLHNPQVKEHMSFEDYSGNLKGCCNHKDFPFWYLDRVYCSIRDKEIVMPEEHHGNE

and so on?

awk '{F=$2"."$3"."$4".txt";print $NF >> F;close(F)}' inputfile
1 Like

Use $NF in the print statement.

awk '{F=$2"."$3"."$4".txt";print $NF >> F;close(F)}' inputfile

Guru's code works fine for me too.

--ahamed

Thanks again, you guys are amazing.

And I tried Guru's code again, and it worked, I don't know why I was seeing empty text files. It also ran significantly faster. Sorry about that, Thanks Guru!!