Merging two files in UNIX

bharathbangalor · November 21, 2013, 6:38am

Hi Experts,
Need urgent solution for a problem.

I have two files file1 and file2. file1 is tab separated and file2 is comma separated.
I need to merge both the files into single file based on CUST_ID by retaining the headers of file1

Matching CUST_IDs should be placed one below the other in the output file.If any correspnding column is missing in file 2 it has to be replaced with 'NA';

cat file1:(tab separated)

CUST_ID	Name	Place	acc_no
1	abc	blr|chn|dlh	500|600|700
2	xyz	hyd|blr	888|999
3	mno	blr|hyd	111|222|333


cat file2:(coma separated)
CUST_ID,Name,acc_no
1,abc,500|600|700
2,xyz,111|222|333


Required output:

CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333

Akshay_Hegde · November 21, 2013, 7:02am

bharathbangalor:

Hi Experts,
Need urgent solution for a problem.

I have two files file1 and file2. file1 is tab separated and file2 is comma separated.
I need to merge both the files into single file based on CUST_ID by retaining the headers of file1

Matching CUST_IDs should be placed one below the other in the output file.If any correspnding column is missing in file 2 it has to be replaced with 'NA';
cat file1:(tab separated)

CUST_ID    Name    Place    acc_no
1    abc    blr|chn|dlh    500|600|700
2    xyz    hyd|blr    888|999
3    mno    blr|hyd    111|222|333


cat file2:(coma separated)
CUST_ID,Name,acc_no
1,abc,500|600|700
2,xyz,111|222|333


Required output:

CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333

Try:

$ awk -F'[ ,]' 'FNR==NR{gsub(/[[:space:]]/,",");A[$2]=$0;next}A[$2]{$3="NA,"$3;$0=FNR==1 ? A[$2]: A[$2] RS $0}A[$2]'  OFS=\, file1 file2

Resulting

CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333

bharathbangalor · November 21, 2013, 8:04am

akshay hegde:

Try:

$ awk -F'[ ,]' 'FNR==NR{gsub(/[[:space:]]/,",");A[$2]=$0;next}A[$2]{$3="NA,"$3;$0=FNR==1 ? A[$2]: A[$2] RS $0}A[$2]'  OFS=\, file1 file2

Resulting

CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333

Hi akshay ,
let me post you the actual data. PFA the data for the above specification.
Kindly help me !

Akshay_Hegde · November 21, 2013, 10:34am

If you could explain relationship between sample data posted in #1 and actual data attached in #3 I will try to understand what you are going to do.

bharathbangalor · November 22, 2013, 12:59am

The tab separated file is the master file which contains all the columns and CSV file is the slave file which contains only few columns from master.

Requirement.
The final output should contain all columns(headers and its value) from master file and matching columns from slave file

Both the files have to be merged together based on CUSTSEQ . If any column is missing in slave file it has to be append as 'NA' under the header column

PFA the sample.

pravin27 · November 22, 2013, 2:00am

 awk -F"[\t,]" 'NR==FNR{if (NR==1) {for (i=1;i<=NF;i++) {d=$i }}
line[$1]=$0;next }
{if(FNR==1) { for (j=1;j<=NF;j++) {gsub("\"","",$j);c[j]=$j;}}
if ( line[$1] ) { print line[$1]
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" } print "\n";
} }' master slave

bharathbangalor · November 22, 2013, 4:40am

pravin27:

 awk -F"[\t,]" 'NR==FNR{if (NR==1) {for (i=1;i<=NF;i++) {d=$i }}
line[$1]=$0;next }
{if(FNR==1) { for (j=1;j<=NF;j++) {gsub("\"","",$j);c[j]=$j;}}
if ( line[$1] ) { print line[$1]
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" } print "\n";
} }' master slave

Hi pravin,

The above code will output the headers from both the files and display values one below the other.But the output should contain headers from master file alone and there respective value from slave file.

pravin27 · November 22, 2013, 6:06am

awk -F"[\t,]" 'NR==FNR{if (NR==1) {print $0;for (i=1;i<=NF;i++) { d=$i } } else {line[$1]=$0;next }}
{if(FNR==1) {for (j=1;j<=NF;j++) { gsub("\"","",$j);c[j]=$j;}} else {
if ( line[$1] ) {
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" };printf "\n";
} }}' master slave

bharathbangalor · November 22, 2013, 6:59am

pravin27:

awk -F"[\t,]" 'NR==FNR{if (NR==1) {print $0;for (i=1;i<=NF;i++) { d=$i } } else {line[$1]=$0;next }}
{if(FNR==1) {for (j=1;j<=NF;j++) { gsub("\"","",$j);c[j]=$j;}} else {
if ( line[$1] ) {
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" };printf "\n";
} }}' master slave

Its not working pravin. The values from slave file are not placed under correct headers. Also values from master file are not displayed.

Akshay_Hegde · November 22, 2013, 7:20am

If you are trying with data other than which you attached here, our code will not work. we noticed in last 2/3 thread of yours that you are giving sample data which is completely different from your actual data.

bharathbangalor · November 22, 2013, 7:25am

Sorry for creating confusion.
I have attached the actual data,and this is the one which I am actually working on.

RudiC · November 22, 2013, 7:28am

You may want to adapt this proposal - which works for your sample data - to your master and slave files:

awk -F, 'NR>1 {$2=$2 OFS "NA"; print}' OFS="\t" file2 | cat - file1 | sort | tr '\t' ','
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
3,mno,blr|hyd,111|222|333
CUST_ID,Name,Place,acc_no

EDIT: sort -n will bring the header to line#1...