Hi Experts,
Need urgent solution for a problem.
I have two files file1 and file2. file1 is tab separated and file2 is comma separated.
I need to merge both the files into single file based on CUST_ID by retaining the headers of file1
Matching CUST_IDs should be placed one below the other in the output file.If any correspnding column is missing in file 2 it has to be replaced with 'NA';
cat file1:(tab separated)
CUST_ID Name Place acc_no
1 abc blr|chn|dlh 500|600|700
2 xyz hyd|blr 888|999
3 mno blr|hyd 111|222|333
cat file2:(coma separated)
CUST_ID,Name,acc_no
1,abc,500|600|700
2,xyz,111|222|333
Required output:
CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
bharathbangalor:
Hi Experts,
Need urgent solution for a problem.
I have two files file1 and file2. file1 is tab separated and file2 is comma separated.
I need to merge both the files into single file based on CUST_ID by retaining the headers of file1
Matching CUST_IDs should be placed one below the other in the output file.If any correspnding column is missing in file 2 it has to be replaced with 'NA';
cat file1:(tab separated)
CUST_ID Name Place acc_no
1 abc blr|chn|dlh 500|600|700
2 xyz hyd|blr 888|999
3 mno blr|hyd 111|222|333
cat file2:(coma separated)
CUST_ID,Name,acc_no
1,abc,500|600|700
2,xyz,111|222|333
Required output:
CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
Try:
$ awk -F'[ ,]' 'FNR==NR{gsub(/[[:space:]]/,",");A[$2]=$0;next}A[$2]{$3="NA,"$3;$0=FNR==1 ? A[$2]: A[$2] RS $0}A[$2]' OFS=\, file1 file2
Resulting
CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
Try:
$ awk -F'[ ,]' 'FNR==NR{gsub(/[[:space:]]/,",");A[$2]=$0;next}A[$2]{$3="NA,"$3;$0=FNR==1 ? A[$2]: A[$2] RS $0}A[$2]' OFS=\, file1 file2
Resulting
CUST_ID,Name,Place,acc_no
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
Hi akshay ,
let me post you the actual data. PFA the data for the above specification.
Kindly help me !
If you could explain relationship between sample data posted in #1
and actual data attached in #3
I will try to understand what you are going to do.
The tab separated file is the master file which contains all the columns and CSV file is the slave file which contains only few columns from master.
Requirement.
The final output should contain all columns(headers and its value) from master file and matching columns from slave file
Both the files have to be merged together based on CUSTSEQ . If any column is missing in slave file it has to be append as 'NA' under the header column
PFA the sample.
awk -F"[\t,]" 'NR==FNR{if (NR==1) {for (i=1;i<=NF;i++) {d=$i }}
line[$1]=$0;next }
{if(FNR==1) { for (j=1;j<=NF;j++) {gsub("\"","",$j);c[j]=$j;}}
if ( line[$1] ) { print line[$1]
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" } print "\n";
} }' master slave
pravin27:
awk -F"[\t,]" 'NR==FNR{if (NR==1) {for (i=1;i<=NF;i++) {d=$i }}
line[$1]=$0;next }
{if(FNR==1) { for (j=1;j<=NF;j++) {gsub("\"","",$j);c[j]=$j;}}
if ( line[$1] ) { print line[$1]
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" } print "\n";
} }' master slave
Hi pravin,
The above code will output the headers from both the files and display values one below the other.But the output should contain headers from master file alone and there respective value from slave file.
awk -F"[\t,]" 'NR==FNR{if (NR==1) {print $0;for (i=1;i<=NF;i++) { d=$i } } else {line[$1]=$0;next }}
{if(FNR==1) {for (j=1;j<=NF;j++) { gsub("\"","",$j);c[j]=$j;}} else {
if ( line[$1] ) {
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" };printf "\n";
} }}' master slave
pravin27:
awk -F"[\t,]" 'NR==FNR{if (NR==1) {print $0;for (i=1;i<=NF;i++) { d=$i } } else {line[$1]=$0;next }}
{if(FNR==1) {for (j=1;j<=NF;j++) { gsub("\"","",$j);c[j]=$j;}} else {
if ( line[$1] ) {
for ( x=1;x<i;x++ ) {
for ( y=1;y<j;y++ ) {
if (d[x] == c[y] ) { flg=1;printf $y"\t"} else { flag=0}
}
} if ( flag == 0 ) { printf "NA" };printf "\n";
} }}' master slave
Its not working pravin. The values from slave file are not placed under correct headers. Also values from master file are not displayed.
If you are trying with data other than which you attached here, our code will not work. we noticed in last 2/3 thread of yours that you are giving sample data which is completely different from your actual data.
Sorry for creating confusion.
I have attached the actual data,and this is the one which I am actually working on.
RudiC
November 22, 2013, 7:28am
12
You may want to adapt this proposal - which works for your sample data - to your master and slave files:
awk -F, 'NR>1 {$2=$2 OFS "NA"; print}' OFS="\t" file2 | cat - file1 | sort | tr '\t' ','
1,abc,blr|chn|dlh,500|600|700
1,abc,NA,500|600|700
2,xyz,hyd|blr,888|999
2,xyz,NA,111|222|333
3,mno,blr|hyd,111|222|333
CUST_ID,Name,Place,acc_no
EDIT: sort -n
will bring the header to line#1...