Combine first two words ( country name ) into one word in every line of log file with 500 records

United States 1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
Italy  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
India  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
south Africa  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517

output will be :

UnitedStates 1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
Italy  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
ndia  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
southAfrica  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517 

Try

awk 'NF>11 {sub(/ /,"")}  1' file
UnitedStates 1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
Italy  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
India  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517
southAfrica  1.2.3.4  80  10 1563790914  1   1932454179 1.2.3.6  55517  11.1.2.1  55517 

EDIT: Or, if you're unsure about the space count but know there's an IP address in the second field, try

awk '{while ($2 !~ /([0-9]*\.)[0-9]*\.[0-9]*\.[0-9]*/) sub (/ /, _)} 1' file
1 Like

thanks , would you please suggest me to good documents where to learn such these trick :slight_smile:

Best place to start is the AWK manual: The GNU Awk User�s Guide
Then you need lot of patience, practice, reading forums like this one and think of more than one way to solve a problem :smiley:

2 Likes

Depending on how (and by what) you further process your data you can also do it in shell directly instead of calling awk for only that purpose:

while read f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 ; do
     if [ -n "$f12" ] ; then
          printf "%s %s %s %s %s %s %s %s %s %s %s\n" "${f1}${f2}" "$f3" "$f4" "$f5" "$f6" "$f7" "$f8" "$f9" "$f10" "$f11" "$f12"
     else
          printf "%s %s %s %s %s %s %s %s %s %s %s\n" "${f1}" "${f2}" "$f3" "$f4" "$f5" "$f6" "$f7" "$f8" "$f9" "$f10" "$f11"
     fi
done < /path/to/input

Adjust the printf -format string to fine-tune the output format.

I hope this helps.

bakunin

1 Like

-deleted-

thanks what about if NF is changeable , sometime might be 11 or 14 , or bigger or less so NF>11 not work my dear

In this case you need to define what your problem is in a more succinct way.

You defined your problem in the title as "Combine first two words ( country name ) into one word in every line".

Later, when you posted some sample and your expected outcome one could see that it was not "combine the first two words", but rather "combine the first two words under certain circumstance". Up to now we reacted to this by posting solutions to what the problem looked like - all lines had 11 fields and where 12 words were found it must be a first field with 2 words which needed to be combined.

Now you tell us that we need to base what is a "two-word country" on some different decision. But in fact - because only you know your data - you are the only one to come up with such a decision. So tell us how we (that is: the programs we write) can find out what is a two-word country and how we can identify it. From there on the solution is a piece of cake.

I hope this helps.

bakunin

Did you read (and understand) the second proposal in (the EDIT of) post #2 in this thread, which is independent of NF?

sed 's/\s\+\([[:alpha:]]\)/\1/' file