Dear shell experts,
I would like to spilt a txt file into small ones. However, I did not know how to program use shell. If someone could help, it is greatly appreciated!
Specifically, I supposed there is file named A.txt. The content of the file likes this:
Subject run condtion ACC time duration weight
2 1 1 0 122.6 2 1
2 1 1 0 144.8 2 1
2 1 1 1 132.6 2 1
2 1 1 1 182.6 2 1
2 1 1 1 198.6 2 1
2 1 1 1 230.6 2 1
2 1 1 1 308.6 2 1
2 1 1 1 368.6 2 1
2 1 1 1 382.6 2 1
2 1 1 1 410.6 2 1
2 1 2 0 294.8 2 1
2 1 2 0 394.8 2 1
2 1 2 1 66.6 2 1
2 1 2 1 78.6 2 1
2 1 2 1 158.6 2 1
2 1 2 1 207.1 2 1
..........................
I want to split the last three columns into many small ones. if the fourth column, namely "ACC' is zero, I want to put them into a txt file, named "s00x_run1_condition1_ACC0". If the fourth column, namely "ACC' is one, I want to put them into a txt file, named "s00x_run1_condition1_ACC1"
For example, in the first two rows, the ACC column is zero, so I put the first and the second column into one txt file, and named "s00x_run1_condition1_ACC1", there are 50 subjects and every subject have 6 runs, and every runs have 3 conditions.
Although I can do those manually, it is a time-consuming task. Thus, if anybody could help, it will be greatly appreciated!
Thank you very much in advance!
Your description is very vague.
Please add a few lines to your sample input for a subject with a 2-digit subject number, and then show us the exact names of the output files you want to have produced from that input and the contents that should be saved in each of those files.
You show the name of the output files as s00x_run1_condition1_ACC1
. Does that mean that subject 25 should have filename s00x_run1_condition1_ACC1
, s0025_run1_condition1_ACC1
, or s025_run1_condition1_ACC1
?
What field delimiter do you want between output fields? Your input has tabs as separators for most fields; but some use a space and a tab; and some use a space, a tab, and several more spaces?
Do you want to throw away the data in the 1st four fields in the new, split files; or do you want to preserve the current lines but just split them into new files?
1 Like
Hi
Thank you for your generous help! I am sorry that I did not make it clear. The original post may difficult to understand. So, I made some changes. I make the file into four columns like this:
name Onsetime duration weight
s002_run1_fng 122.6 2 1
s002_run1_fng 144.8 2 1
s002_run1_fyg 132.6 2 1
s002_run1_fyg 182.6 2 1
s002_run1_fyg 198.6 2 1
s002_run1_fyg 230.6 2 1
s002_run1_fyg 308.6 2 1
s002_run1_fyg 368.6 2 1
s002_run1_fyg 382.6 2 1
s002_run1_fyg 410.6 2 1
s002_run1_fnl 294.8 2 1
s002_run1_fnl 394.8 2 1
s002_run1_fyl 66.6 2 1
s002_run1_fyl 78.6 2 1
s002_run1_fyl 158.6 2 1
s002_run1_fyl 207.1 2 1
s002_run1_fyl 257.4 2 1
s002_run1_fyl 269.2 2 1
s002_run1_fyl 319.2 2 1
s002_run1_fyl 327.3 2 1
s002_run1_fnn 52.8 2 1
s002_run1_fnn 280.7 2 1
s002_run1_fnn 350.8 2 1
s002_run1_fyn 96.6 2 1
s002_run1_fyn 110.6 2 1
s002_run1_fyn 169.3 2 1
I also attached the file. I would like to split the file into new txt files, the first column will be the name of the new file, and the second to the fourth column is the content of the new file. If the first column is the same, the second to the fourth column will be composed a new file. For example, the first file I want to is
122.6 2 1
144.8 2 1
and the new file name would be "s002_run1_fng"
How to code using shell ?
Thank you! Your help is greatly appreciated!
The following seems to do what you want:
awk '
FNR == 1 { next } # Skip header.
last != $1","$2","$3","$4 { # Switch to new file.
if(fn != "") close(fn) # close previous output file.
fn = sprintf("s%03d_run%d_condition%d_ACC%d", $1, $2, $3, $4) # Set new output file name.
last = $1","$2","$3","$4 # Save key for new file.
}
{ print > fn } # Add line to split file.
' a.txt
If you want to run this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
, /usr/xpg6/bin/awk
, or nawk
.
Note that this script is based on a few assumptions:
- The split files should not contain a header line; only data lines.
- The input lines for a every combination of subject, run, condtion, and ACC are all adjacent in your input file.
- If there are any split output files from a previous run, you want to replace those files rather than append the new data to them.
If any of these assumptions are incorrect, this script will need to be modified.