It is my first post, hoping to get help from the forum.
In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column.
OM 3328 O BT 268 5.800 7.500 4.700 0.000 1.400
OM 3329 O BT 723 8.500 8.900 3.600 8.500 1.400
OM 3330 O AT 231 6.700 5.500 7.600 0.000 1.400
OM 3331 O AT 234 1.200 7.700 5.500 8.500 1.400
OM 3332 O AT 256 3.800 5.800 5.200 0.000 1.400
(Step-1)The bottom of the file needs entire few rows (only with string AT) to be removed ONLY if the 9th column is greater than a value of 0.10 . Then the kept rows in file shall be saved into a new file. An iteration command is required to do it on series of 5000 multiple files.
(Step-2)Next, a program 'calc' will be executed into this multiple new named files one by one. Again, if the 9th column is greater than value 0.10 (only for rows with string AT), then the corresponding row shall be removed from the file. Kept rows shall be renamed into new file.
I have written a short bash code below to execute the program 'calc' to series of multiple files in directory, and so far this small code for linux took me entire day to figure out because I dont have skill in writing any codes.
-------
#!/bin/sh
for d in $(\ls -d *.txt)
do
./calc $d
done
-------
(Step-3) Finally, every files that contains the same number of lines (ie, 3098, 3095, 3097 etc) shall be saved in single file, accordingly. In this case, from the original 5000 multiple files, the output file expected can be divided for example into:
3098 filename = containing all files with 3098 lines
3095 filename = containing all files with 3095 lines
3097 filename = containing all files with 3097 lines
Thank you so much for your time and attention.
-A
---------- Post updated at 02:08 PM ---------- Previous update was at 11:50 AM ----------
To tackle the problem in each step, first I need to remove matching lines by string and value.
In GNU/Linux x86_64:
awk '($4 ~ /^AT$/){print}' newfile
The code above says that 4th column with matching string AT, will print into newfile. BUT, I need to tell the script that ONLY if the 9th column has value in between 0.00-0.10 ? How to do that in bash shell ?
> cat file
OM 3328 O BT 268 5.800 7.500 4.700 0.000 1.400
OM 3329 O BT 723 8.500 8.900 3.600 8.500 1.400
OM 3330 O AT 231 6.700 5.500 7.600 0.000 1.400
OM 3331 O AT 234 1.200 7.700 5.500 8.500 1.400
OM 3332 O AT 256 3.800 5.800 5.200 0.100 1.400
> cat newfile
OM 3330 O AT 231 6.700 5.500 7.600 0.000 1.400
OM 3332 O AT 256 3.800 5.800 5.200 0.100 1.400
Please help.
-A