trymega
December 18, 2019, 9:54pm
1
I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:
ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09
I want to split this table in to multiple files:
# A.txt
ID1;ID2;ID3;ID4;A_1;A_2;A_3
AA;ax;ay;az;01;04;07
BB;bx;by;bz;03;44;27
# B.txt
ID1;ID2;ID3;ID4;B_1;B_2;B_3
AA;ax;ay;az;02;05;08
BB;bx;by;bz;05;15;08
# C.txt
ID1;ID2;ID3;ID4;C_1;C_2;C_3
AA;ax;ay;az;03;06;09
BB;bx;by;bz;33;26;09
RudiC
December 19, 2019, 6:23am
2
This should work for exactly the data sample you presented; for "thousands of columns", additional efforts need to be spent, like closing output files after appending to them:
awk -F\; '
NR == 1 {for (i=5; i<=NF; i++) {split ($i, T, "_")
COL[T[1]] = COL[T[1]] OFS i
}
}
{for (c in COL) {n = split (COL[c], T)
OUT = $(T[2])
for (i=3; i<=n; i++) OUT = sprintf ("%s;%s", OUT, $(T))
print $1, $2, $3, $4, OUT > (c ".txt")
}
}
' OFS=\; file
---------- A.txt: ----------
ID1;ID2;ID3;ID4;A_1;A_2;A_3
AA;ax;ay;az;01;04;07
BB;bx;by;bz;03;44;27
---------- B.txt: ----------
ID1;ID2;ID3;ID4;B_1;B_2;B_3
AA;ax;ay;az;02;05;08
BB;bx;by;bz;05;15;08
---------- C.txt: ----------
ID1;ID2;ID3;ID4;C_1;C_2;C_3
AA;ax;ay;az;03;06;09
BB;bx;by;bz;33;26;09