Shell script to Split matrix file with delimiter into multiple files

trymega · December 18, 2019, 9:54pm

I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:

ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09

I want to split this table in to multiple files:

# A.txt

ID1;ID2;ID3;ID4;A_1;A_2;A_3
AA;ax;ay;az;01;04;07
BB;bx;by;bz;03;44;27

# B.txt

ID1;ID2;ID3;ID4;B_1;B_2;B_3
AA;ax;ay;az;02;05;08
BB;bx;by;bz;05;15;08

# C.txt

ID1;ID2;ID3;ID4;C_1;C_2;C_3
AA;ax;ay;az;03;06;09
BB;bx;by;bz;33;26;09

RudiC · December 19, 2019, 6:23am

This should work for exactly the data sample you presented; for "thousands of columns", additional efforts need to be spent, like closing output files after appending to them:

awk -F\; '
NR == 1 {for (i=5; i<=NF; i++)  {split ($i, T, "_")
                                 COL[T[1]] = COL[T[1]] OFS  i 
                                }
        }

        {for (c in COL) {n = split (COL[c], T)
                         OUT = $(T[2])
                         for (i=3; i<=n; i++) OUT = sprintf ("%s;%s", OUT, $(T))
                         print $1, $2, $3, $4, OUT  >  (c ".txt")
                        }
        }
' OFS=\; file
---------- A.txt: ----------

ID1;ID2;ID3;ID4;A_1;A_2;A_3
AA;ax;ay;az;01;04;07
BB;bx;by;bz;03;44;27

---------- B.txt: ----------

ID1;ID2;ID3;ID4;B_1;B_2;B_3
AA;ax;ay;az;02;05;08
BB;bx;by;bz;05;15;08

---------- C.txt: ----------

ID1;ID2;ID3;ID4;C_1;C_2;C_3
AA;ax;ay;az;03;06;09
BB;bx;by;bz;33;26;09