hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is needed .For example the given file which looks like this
1902 8 22 3 40.0000 77.0000 8.60
1902 8 22 3 40.0000 76.5000 8.20
1902 8 22 3 40.0000 76.5000 8.30
1902 8 22 3 40.0000 77.0000 8.40
1902 8 22 3 39.8000 76.2000 8.10
1902 9 30 6 38.5000 67.0000 7.70
1902 9 30 6 38.5000 67.0000 6.30
1902 10 6 9 36.5000 70.5000 7.20
1902 12 4 22 37.8000 65.5000 4.90
Now i want the output for such a file as below
1902 8 22 3 40.0000 77.0000 8.60
1902 8 22 3 40.0000 76.5000 8.30
1902 8 22 3 39.8000 76.2000 8.10
1902 9 30 6 36.5000 67.0000 7.70
1902 10 6 9 36.5000 70.5000 7.20
1902 12 4 22 37.8000 65.5000 4.90
------
awk '{ va2=$NF;va1=$(NF-1);va=$(NF-2);$NF=" ";$(NF-1)=" ";$(NF-2)=" ";if ($0 in a) { if (va" "va1" "va2 >a[$0] ){a[$0]=va" "v
a1" "va2" "}} else {a[$0]=va" "va1" "va2}} END { for ( i in a ) print i" "a }' file_name.txt
As i said already :
need to check further as the order of the elements in associative array is not the same.
Its not exactly working..
To tell
My data has different values in the first column not all are same as i had mentioned in question &
data in my file looks some what lik this
To keep the forums high quality for all users, please take the time to format your posts correctly.
First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags
```text
and
```
by hand.)
Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
it's working properly for me , of course with sort a you can make the sequence in order.
something like this :
awk '{ va2=$NF;va1=$(NF-1);va=$(NF-2);$NF="";$(NF-1)="";$(NF-2)="";if ($0 in a) { if (va" "va1" "va2 >a[$0] ){a[$0]=va" "va1" "va2}} else {a[$0]=va" "va1" "va2}} END { for ( i in a ) print i" "a }' file_name.txt | sort +1n
---------- Post updated 08-26-09 at 04:34 AM ---------- Previous update was 08-25-09 at 08:49 AM ----------
Thanks for the help i got it...
---------- Post updated at 04:45 AM ---------- Previous update was at 04:34 AM ----------
Hiii
now if i have data like shown below.how to sort it out. i mean delete duplicate entries in such a way that it must take the largest value in last column & it must choose a row which has many sets of values in the row.
For example the data in my file is
If i have 19 columns & i need to just check duplicates for column 1,2,3,4 & tak the largest value of column 18.Then how to use awk..help me out & try explaining the code also i am very new to unix to tell.
Thanks in advance
awk '{ if($1" "$2" "$3" "$4 in a) { if(va < $(NF-1)) {a[$1" "$2" "$3" "$4]=$0;va=$(NF-1);next}} else { a[$1" "$2" "$3" "$4]=$0;va=$(NF-1)}} END { for ( i in a) print a }' file_name.txt | sort +1n