KSH to group records in a file and compare it with another file

Matrix2682 · August 18, 2009, 10:26am

Hi,

I've a file like below:

DeptFile.csv

DeptID         EmpID
-------        ------
Dep01          Emp01
Dep01          Emp02
Dep01          Emp03
Dep02          Emp04
Dep02          Emp05

I've another file which has

EmpFile.csv

EmpID        Salary
------        ------
Emp01        10000
Emp02        12000
Emp03         6000
Emp04         5000
Emp05        15000

Now I need to take the employee who is getting maximum salary in each department. i.e., in Dept01, I need to take employee who is getting max salary (Emp02) and load it into another file in the following format.

DeptID     EmpID     Salary

I've tried using KSH by taking each record and comparing it with the next file. But since the two csv files (Dept and Emp) are huge in size and it takes more time to execute. Please suggest if any better solution is there to implement this. I use KSH or AWK...

Thanks.

vgersh99 · August 18, 2009, 10:31am

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

---------- Post updated at 10:31 AM ---------- Previous update was at 10:29 AM ----------

Please show what you have so far.

Matrix2682 · August 19, 2009, 9:10am

malcomex999, thanks for the code... it works... but if a same employee belongs to 2 departments, that record is not coming in output.

awk 'NR>2 && NR==FNR{arr[$2]=$1}NR>2 && NR!=FNR {for(i in arr){if(i==$1) print arr,i,$2}}' dept emp |
sort -n +2 | awk '{arr[$1]=$0}END{for (i in arr){print arr}}'

I'll give you the actual data:

Dept.csv:

22405        ,22405        
22979        ,22979        
23193        ,23193        
23193        ,23200        
23193        ,27741        
23193        ,27743        
23193        ,38758        
23193        ,38759        
23194        ,23194        
23194        ,23202        
23194        ,27744        
23194        ,27745        
23200        ,23193        
23200        ,23200        
23200        ,27741        
23200        ,27743        
23200        ,38758        
23200        ,38759

Emp.csv

22405        ,3500
22979        ,3800
23193        ,3900
23194        ,3900
23200        ,3900
23202        ,3900
27797        ,3900

Here, I am not getting Emp# 23193 in the output. Please help...

malcomex999 · August 22, 2009, 9:14am

Based on your prevoius actual post...Try this...

awk -F"," 'NR==FNR{arr[$1","$2]++}NR!=FNR {for(i in arr){
split(i,ss,",");
if(ss[2]==$1) print i","$2}}' dept emp |
sort -t "," -nr +2 | awk -F"," '{arr[$1]=$0}END{for (i in arr){print arr}}'

For Emp# 23193, in the same Dept. you are having another Emp# with the same salary. So it picked up the first one.