Dear all,
I am in a bit of a quandary. I have 400 text files which I need to edit and output in a very specific way.
Here is a sample text file copied from gedit ... The columns will come out a bit messed up but when I cat <file>, it gives a table with six columns (0-28, tot lob vol, vcsf, scsf, gray, white):
tot lob vol Vcsf Scsf gray white
0 0.000000 0.000000 0.000000 0.000000
1 0.000000 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.000000 0.000000
3 26355.514477 0.000000 7986.582205 13035.582840 5333.349432
4 61027.455763 0.000000 7617.445767 26799.098600 26610.911396
5 16966.834205 0.000000 2342.000090 9508.623764 5116.210351
6 16675.247439 0.000000 2196.206707 9037.121760 5441.918973
7 57706.261817 0.000000 16951.324271 24216.177530 16538.760016
8 91215.992021 2102.113105 7351.708891 41168.535796 40593.634229
9 56586.444555 0.000000 6690.985687 29733.578183 20161.880685
10 24258.571350 0.000000 4461.691120 13908.275147 5888.605082
11 99730.945990 2593.261027 10288.256465 52107.175509 34742.252989
12 8293.678905 967.819905 55.835764 4209.396188 3060.627049
13 13760.413773 2139.336947 21.713908 6338.393179 5260.969738
14 25516.944025 0.000000 7529.556139 10519.871485 7467.516401
15 30830.647540 451.856088 3489.735233 14415.966999 12473.089220
16 26945.925979 0.000000 7874.910677 12887.721466 6183.293835
17 65250.293894 0.000000 7736.355264 29207.274410 28306.664221
18 17500.375947 0.000000 2880.711810 9932.561970 4687.102167
19 17848.832473 3.101987 2632.552860 9684.403021 5528.774605
20 51296.522939 0.000000 14967.086667 21813.171698 14516.264575
21 92378.203103 3180.570541 8504.614013 40641.198027 40051.820521
22 58887.084820 14.475939 8116.865654 29876.269579 20879.473649
23 25229.493242 0.000000 3892.993527 14250.527699 7085.972016
24 90686.586261 1761.928544 10282.052492 48120.088380 30522.516844
25 10664.630873 1746.418610 201.629147 5669.398010 3047.185106
26 12702.636249 2146.574917 26.883886 5388.151200 5141.026246
27 30967.134963 0.000000 9628.567257 13284.775786 8053.791920
28 34047.407929 150.963361 4818.419611 15476.846510 13601.178447
If you've read that far, I thank you already.
Basically, I need to do the following:
- Delete the first four rows
- Delete the first column (numbers 1-28)
- Paste each row to the end of the previous row, so one file comes out as one super long line
- Comma-delimit all numbers
- Add new first field at the beginning of the line containing the file name
- Output all of this to a new text file which will take each of the 400 or so text files I have as above, convert them each to their own line consisting of steps 1-5.
So, an example output file from just the above file would be:
file1.txt, 26355.514477, 0.000000, 7986.582205, 13035.582840, 5333.349432, ..., 34047.407929, 150.963361, 4818.419611, 15476.846510, 13601.178447 <carriage return>
file2.txt, etc....
I know zippo about programming but pasting together various snippets I've found online have come up with the following which does not seem to working exactly right ... my output is joining some numbers, particularly those from the end of one line and the beginning of another.
#!/bin/sh
#script file placed in the directory above that which contains all the text files
#script run by typing ../script.sh from within the directory containing the files
files=`ls -l *_stats.txt`
# all files end in _stats.txt
for file in ${files}
do
cat ${file} | sed '1,4d'| awk '{print $2, $3, $4, $5, $6}' | sed 's/[:space:]+/,/g' >> output.txt
#read file, delete lines 1-4, print all but the first column, delete all whitespace #between numbers including carriage returns and put them all on one line with #comma-delimiters; then new line with the output of the next file
done
Any help would be most appreciated. It is sad, but I have spent HOURS trying to figure this out and I am pretty sure it would be ridiculously simple for anyone with basic bash scripting knowledge. Please help!
Best wishes,
Anthonz