Combine incrimental line

pedot · October 14, 2017, 9:23pm

hi guys,

i am writing a bash script.. that produce output some thing like this:

road[0] 100 300 500
road[1] 100 300 500
road[2] 100 300 500
road[3] 100 300 500
street[0] 400 200 700
street[1] 400 200 700
path 200 100 900

i would like to combine all the entries that having the same but incremental (not sure what is the correct term) first column in condition that the rest of the column are exact match. best way to explain is by the following desire output:

road[0:3] 100 300 500
street[0:1] 400 200 700
path 200 100 900

im a beginner in script writing. mostly using bash script to automate my task and in my bash script contain nothing but very useful tool from unix itself like sed awk tail sort uniq column etc...

Scott · October 14, 2017, 9:36pm

When writing your Bash script, why didn't you sum up the values as you went and output that instead? It seems you had all the information you needed from the input to that script to do that.

pedot · October 14, 2017, 9:54pm

i understand your feedback but the script that i wrote is just extracting some data from other file.. which the bit like data is originally there..

thanks anyway for responding this..

Scott · October 14, 2017, 10:02pm

Just for understanding, do the bits in brackets in the desired output ([0:3], for example) represent the minimum and maximum values found in the input (e.g. read[0] ... read[1] ... ... read[3]), or what?

pedot · October 14, 2017, 10:10pm

yes yes they are representing minimum and maximum..

so that let say if we have

road[0] 100 300 500
road[1] 100 300 500
road[2] 100 300 500
road[3] 100 300 500
road[4] 100 400 600
road[5] 100 400 600

it will output something like this:

road[0:3] 100 300 500
road[4:5] 100 400 600

Scott · October 14, 2017, 10:32pm

This is why you should write a specification of what you want (not withstanding the fact that you haven't shown us any of your work), instead of just throwing data at us. First the brackets, now the following fields as well need to be considered. How should they be considered?

pedot · October 14, 2017, 10:50pm

actually that is all im looking for.. actually for the field i try to mention at post #1 "in condition that the rest of the column are exact match" im sorry if the english is not good. maybe i need to use field instead of column..

as for the erlier script that i had done. they are just simple sed to remove the any line that match header and foote. also ditch out some unwanted data like the last 2 column using awk.

this is the original file

header title
 
road[0] 100 300 500 0.3 0.5
road[1] 100 300 500 0.3 0.6 
road[2] 100 300 500 0.3 0.7
road[3] 100 300 500 0.3 0.8
road[4] 100 400 600 0.3 0.9
road[5] 100 400 600 0.3 0.9
street[0] 400 200 700 0.5 0.3
street[1] 400 200 700 0.5 0.4
path 200 100 900 0.6 0.1

footer

in my bash im using follwing line to get my almost desired data:

less <file> | sed '/header\|footer/d' | awk '{print $1, $2, $3, $4}'

the code get me this

road[0] 100 300 500
road[1] 100 300 500
road[2] 100 300 500
road[3] 100 300 500
road[4] 100 400 600
road[5] 100 400 600
street[0] 400 200 700
street[1] 400 200 700
path 200 100 900

and i wish my output to be like this so that my output will not look so mess especially when the same bit goes up until more than 100.

road[0:3] 100 300 500
road[4:5] 100 400 600
street[0:1] 400 200 700
path 200 100 900

sorry for the confusion and thanks for the clarification.

RudiC · October 16, 2017, 6:17am

Try this - there may be smarter and more elegant ways to get what you want, but it may serve as a starting point:

awk '
/header|footer/ {next
                }

                {n = split ($1, T, "[][]")
                 X  = T[1] (n>1?"[XYZ]":_) FS $2 FS $3 FS $4
                 if (LAST != X) {sub (/XYZ/, MN ":" MX, LAST)
                                 if (LAST ~ /[^ ]/) print LAST
                                 MN = T[2]
                                }
                   else         {MX = T[2]
                                }
                 LAST = X
                }

' file
road[0:3] 100 300 500
road[4:5] 100 400 600
street[0:1] 400 200 700
path 200 100 900

pedot · October 17, 2017, 1:01pm

rudic:

Try this - there may be smarter and more elegant ways to get what you want, but it may serve as a starting point:

awk '
/header|footer/ {next
   }

   {n = split ($1, T, "[][]")
   X  = T[1] (n>1?"[XYZ]":_) FS $2 FS $3 FS $4
   if (LAST != X) {sub (/XYZ/, MN ":" MX, LAST)
   if (LAST ~ /[^ ]/) print LAST
   MN = T[2]
   }
   else         {MX = T[2]
   }
   LAST = X
   }

' file
road[0:3] 100 300 500
road[4:5] 100 400 600
street[0:1] 400 200 700
path 200 100 900

PERFECTLY NAILED IT !!! Thanks a lot..
i know it does everything that was given but i really wish there is some explanation on what is going on.. at least on what did you sort first in term of overall plan until getting such beautiful output..

RudiC · October 17, 2017, 2:01pm

Thanks for the "beautiful output" - it has been defined by you, hasn't it? Not sure I understand that quote - are you asking for the heuristics applied?

First it is mandatory to look at the input data and the desired output to find out what needs to be eliminated, what propagated and mayhap transformed or converted. Seeing previous attempts (with part successes or failures) always helps. That's why I frequently comment

With above, the basis for a solution is established. Now, build a prototype to see if the logics is close to what is requested, then tweak the solution before finally polishing it... the former gets the logics right, the latter adds e.g. error handling or ergonomy / user friendlyness.

pedot · October 18, 2017, 7:12am

That is why im interested on how do you plan out all the information given to produce such output.

Actually im having hard time to sort things out to get the desired output.. my first attempt is to remove the first column and pipe out sort -u to get the uniq 2nd field.. since i dont know awk that deep to store the output somewhere i then output it to some temporary file.txt to call out later.. then i stuck in there to merge the uniq data back with the 1st column (that i have not yet know how to combine the bit )

RudiC · October 18, 2017, 8:31am

Explaining heuristics (or a "plan to get from input to output") is not easy a task... First it helps a lot to know the ropes, i.e. what tools do you have at hand, their respective applicability, and what are the strengthes and weaknesses of each that might lean itself to the solution in this special, individual case. Here, awk stood out, but e.g. perl might fit even better - unfortunately, I don't have reasonable command of it.
Now, to the creative and - can I say - intuitive part, which can be a stepwise, repetitive, and approximating approach:
Check features and idiosyncrasies of the input, and what needs to be transported to the output. In this case, we can safely assume the input is sorted - should that not be the case, additional steps might need to be taken. We can identify two entities:

road[0] 100 300 500   \
road[1] 100 300 500    \  constant; go to road ... 100 300 500
road[2] 100 300 500    /  
road[3] 100 300 500   / 
     |
     +--> go to [0:3]

You see: $1 (which is field 1) needs to be split to get at its two elements that need different treatment. The constant part plus a placeholder plus the residual fields go to a working variable X, the numerical part is used to find MIN and MAX values, the latter in successive lines as we assume input is sorted. If the pattern in X changes, the output line is created from the LAST variable, by replacing the placeholder with the actual values, and printed out. At the end of each line's processing, X is saved in LAST.
Please bear in mind that above solution is taylored to the input sample given, so we stop here. Should additional conditions arise, they would need to be programmed in as well in - which may well happen - countless additional steps, conditional statements, loops, special cases, etc.

pedot · October 18, 2017, 10:09pm

rudic:

Explaining heuristics (or a "plan to get from input to output") is not easy a task... First it helps a lot to know the ropes, i.e. what tools do you have at hand, their respective applicability, and what are the strengthes and weaknesses of each that might lean itself to the solution in this special, individual case. Here, awk stood out, but e.g. perl might fit even better - unfortunately, I don't have reasonable command of it.
Now, to the creative and - can I say - intuitive part, which can be a stepwise, repetitive, and approximating approach:
Check features and idiosyncrasies of the input, and what needs to be transported to the output. In this case, we can safely assume the input is sorted - should that not be the case, additional steps might need to be taken. We can identify two entities:
road[0] 100 300 500   \
road[1] 100 300 500    \  constant; go to road ... 100 300 500
road[2] 100 300 500    /  
road[3] 100 300 500   / 
   |
   +--> go to [0:3]
You see: $1 (which is field 1) needs to be split to get at its two elements that need different treatment. The constant part plus a placeholder plus the residual fields go to a working variable X, the numerical part is used to find MIN and MAX values, the latter in successive lines as we assume input is sorted. If the pattern in X changes, the output line is created from the LAST variable, by replacing the placeholder with the actual values, and printed out. At the end of each line's processing, X is saved in LAST.
Please bear in mind that above solution is taylored to the input sample given, so we stop here. Should additional conditions arise, they would need to be programmed in as well in - which may well happen - countless additional steps, conditional statements, loops, special cases, etc.

@RudiC
thanks a lot for such explanation / information..
i got overall general idea but need to understand more on deep level of awk maybe..