How to split this txt file into small files?

Dear shell experts,

I would like to spilt a txt file into small ones. However, I did not know how to program use shell. If someone could help, it is greatly appreciated!

Specifically, I supposed there is file named A.txt. The content of the file likes this:

Subject	run	condtion	ACC	time 	duration	weight
2	1	1	0	122.6 	2	1
2	1	1	0	144.8 	2	1
2	1	1	1	132.6 	2	1
2	1	1	1	182.6 	2	1
2	1	1	1	198.6 	2	1
2	1	1	1	230.6 	2	1
2	1	1	1	308.6 	2	1
2	1	1	1	368.6 	2	1
2	1	1	1	382.6 	2	1
2	1	1	1	410.6 	2	1
2	1	2	0	294.8 	2	1
2	1	2	0	394.8 	2	1
2	1	2	1	66.6 	        2	1
2	1	2	1	78.6 	        2	1
2	1	2	1	158.6 	2	1
2	1	2	1	207.1 	2	1

..........................
I want to split the last three columns into many small ones. if the fourth column, namely "ACC' is zero, I want to put them into a txt file, named "s00x_run1_condition1_ACC0". If the fourth column, namely "ACC' is one, I want to put them into a txt file, named "s00x_run1_condition1_ACC1"

For example, in the first two rows, the ACC column is zero, so I put the first and the second column into one txt file, and named "s00x_run1_condition1_ACC1", there are 50 subjects and every subject have 6 runs, and every runs have 3 conditions.

Although I can do those manually, it is a time-consuming task. Thus, if anybody could help, it will be greatly appreciated!

Thank you very much in advance!

Your description is very vague.

Please add a few lines to your sample input for a subject with a 2-digit subject number, and then show us the exact names of the output files you want to have produced from that input and the contents that should be saved in each of those files.

You show the name of the output files as s00x_run1_condition1_ACC1 . Does that mean that subject 25 should have filename s00x_run1_condition1_ACC1 , s0025_run1_condition1_ACC1 , or s025_run1_condition1_ACC1 ?

What field delimiter do you want between output fields? Your input has tabs as separators for most fields; but some use a space and a tab; and some use a space, a tab, and several more spaces?

Do you want to throw away the data in the 1st four fields in the new, split files; or do you want to preserve the current lines but just split them into new files?

1 Like

Hi

Thank you for your generous help! I am sorry that I did not make it clear. The original post may difficult to understand. So, I made some changes. I make the file into four columns like this:

name	Onsetime	duration	weight
s002_run1_fng	122.6 	2	1
s002_run1_fng	144.8 	2	1
s002_run1_fyg	132.6 	2	1
s002_run1_fyg	182.6 	2	1
s002_run1_fyg	198.6 	2	1
s002_run1_fyg	230.6 	2	1
s002_run1_fyg	308.6 	2	1
s002_run1_fyg	368.6 	2	1
s002_run1_fyg	382.6 	2	1
s002_run1_fyg	410.6 	2	1
s002_run1_fnl	294.8 	2	1
s002_run1_fnl	394.8 	2	1
s002_run1_fyl	66.6 	2	1
s002_run1_fyl	78.6 	2	1
s002_run1_fyl	158.6 	2	1
s002_run1_fyl	207.1 	2	1
s002_run1_fyl	257.4 	2	1
s002_run1_fyl	269.2 	2	1
s002_run1_fyl	319.2 	2	1
s002_run1_fyl	327.3 	2	1
s002_run1_fnn	52.8 	2	1
s002_run1_fnn	280.7 	2	1
s002_run1_fnn	350.8 	2	1
s002_run1_fyn	96.6 	2	1
s002_run1_fyn	110.6 	2	1
s002_run1_fyn	169.3 	2	1

I also attached the file. I would like to split the file into new txt files, the first column will be the name of the new file, and the second to the fourth column is the content of the new file. If the first column is the same, the second to the fourth column will be composed a new file. For example, the first file I want to is

122.6 	2	1
144.8 	2	1

and the new file name would be "s002_run1_fng"

How to code using shell ?

Thank you! Your help is greatly appreciated!

The following seems to do what you want:

awk '
FNR == 1 { next } # Skip header.
last != $1","$2","$3","$4 { # Switch to new file.
        if(fn != "") close(fn) # close previous output file.
        fn = sprintf("s%03d_run%d_condition%d_ACC%d", $1, $2, $3, $4) # Set new output file name.
        last = $1","$2","$3","$4 # Save key for new file.
}
{       print > fn } # Add line to split file.
' a.txt

If you want to run this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Note that this script is based on a few assumptions:

  1. The split files should not contain a header line; only data lines.
  2. The input lines for a every combination of subject, run, condtion, and ACC are all adjacent in your input file.
  3. If there are any split output files from a previous run, you want to replace those files rather than append the new data to them.

If any of these assumptions are incorrect, this script will need to be modified.