i would like to combine all the entries that having the same but incremental (not sure what is the correct term) first column in condition that the rest of the column are exact match. best way to explain is by the following desire output:
im a beginner in script writing. mostly using bash script to automate my task and in my bash script contain nothing but very useful tool from unix itself like sed awk tail sort uniq column etc...
When writing your Bash script, why didn't you sum up the values as you went and output that instead? It seems you had all the information you needed from the input to that script to do that.
Just for understanding, do the bits in brackets in the desired output ([0:3], for example) represent the minimum and maximum values found in the input (e.g. read[0] ... read[1] ... ... read[3]), or what?
This is why you should write a specification of what you want (not withstanding the fact that you haven't shown us any of your work), instead of just throwing data at us. First the brackets, now the following fields as well need to be considered. How should they be considered?
actually that is all im looking for.. actually for the field i try to mention at post #1 "in condition that the rest of the column are exact match" im sorry if the english is not good. maybe i need to use field instead of column..
as for the erlier script that i had done. they are just simple sed to remove the any line that match header and foote. also ditch out some unwanted data like the last 2 column using awk.
PERFECTLY NAILED IT !!! Thanks a lot..
i know it does everything that was given but i really wish there is some explanation on what is going on.. at least on what did you sort first in term of overall plan until getting such beautiful output..
Thanks for the "beautiful output" - it has been defined by you, hasn't it? Not sure I understand that quote - are you asking for the heuristics applied?
First it is mandatory to look at the input data and the desired output to find out what needs to be eliminated, what propagated and mayhap transformed or converted. Seeing previous attempts (with part successes or failures) always helps. That's why I frequently comment
With above, the basis for a solution is established. Now, build a prototype to see if the logics is close to what is requested, then tweak the solution before finally polishing it... the former gets the logics right, the latter adds e.g. error handling or ergonomy / user friendlyness.
That is why im interested on how do you plan out all the information given to produce such output.
Actually im having hard time to sort things out to get the desired output.. my first attempt is to remove the first column and pipe out sort -u to get the uniq 2nd field.. since i dont know awk that deep to store the output somewhere i then output it to some temporary file.txt to call out later.. then i stuck in there to merge the uniq data back with the 1st column (that i have not yet know how to combine the bit )
Explaining heuristics (or a "plan to get from input to output") is not easy a task... First it helps a lot to know the ropes, i.e. what tools do you have at hand, their respective applicability, and what are the strengthes and weaknesses of each that might lean itself to the solution in this special, individual case. Here, awk stood out, but e.g. perl might fit even better - unfortunately, I don't have reasonable command of it.
Now, to the creative and - can I say - intuitive part, which can be a stepwise, repetitive, and approximating approach:
Check features and idiosyncrasies of the input, and what needs to be transported to the output. In this case, we can safely assume the input is sorted - should that not be the case, additional steps might need to be taken. We can identify two entities:
road[0] 100 300 500 \
road[1] 100 300 500 \ constant; go to road ... 100 300 500
road[2] 100 300 500 /
road[3] 100 300 500 /
|
+--> go to [0:3]
You see: $1 (which is field 1) needs to be split to get at its two elements that need different treatment. The constant part plus a placeholder plus the residual fields go to a working variable X, the numerical part is used to find MIN and MAX values, the latter in successive lines as we assume input is sorted. If the pattern in X changes, the output line is created from the LAST variable, by replacing the placeholder with the actual values, and printed out. At the end of each line's processing, X is saved in LAST.
Please bear in mind that above solution is taylored to the input sample given, so we stop here. Should additional conditions arise, they would need to be programmed in as well in - which may well happen - countless additional steps, conditional statements, loops, special cases, etc.