UNIX joins : facing issue while joining three files

Hello ,

I have three files :

sampleoutput1.txt has columns (in the following order) :

hostname ; available patches , available packages

sampleoutput2.txt has columns (in the following order) :

hostname ; patchwave ; BSID ; Application

sampleoutput3.txt has columns (in the following order) :

boottime ; hostname

I am trying to join three files on the column hostname from the three files.
I want the output records as follows :

hostname;available patches;available packages;boot time;patchwave;BSID;Application

Below is the script. Here I am first trying to join the files : sampleoutput1.txt and sampleoutput3.txt. Then I am trying to join the resultant with sampleoutput2.txt

#!/bin/bash
awk -F";" 'FNR==NR{A[$2]=$1;next} {print $0 OFS A[$1] OFS}' OFS=";" sampleoutput3.txt sampleoutput1.txt > sampleoutput1-3.txt
awk -F";" 'FNR==NR{B[$1]=$1;next} {print $0 OFS B[$2] OFS B[$3] OFS B[$4]}' OFS=";" sampleoutput2.txt sampleoutput1-3.txt > sampleoutput1-2-3.txt
echo "Output is placed in file : sampleoutput1-2-3.txt"

The file sampleoutput1-3.txt is giving correct output but file : sampleoutput1-2-3.txt is not giving correct output.

Please help.

Thanks
Rahul

I'd be surprised if sampleoutput1-3.txt held the correct output as the field separator is NOT ";" and the files' sequence is wrong.

Given the first file does NOT have a line break before the third field, and that spaces are used consistently around the field contents, and that the DOS/windows <CR> line terminators are removed, this might work:

join -t, -22 file1 file3 | join  -t, - file2
hostname,available patches,available packages,boottime,patchwave,BSID,Application
1 Like

Hello RudiC , the files are not sorted moreover there are some records for hostname which are present in sampleoutput1.txt and not present in sampleoutput2.txt and sampleoutput3.txt. Also delimiter used is ";". I think the above wont work in this scenario. Please help.

You should not modify posts after an answer was given referring to its contents. And, you should do the initial specification with a bit more care.

awk -F ';' '
 FILENAME == ARGV[1] {A[$1] = ($2 ";" $3); next}
 FILENAME == ARGV[2] {A[$2] = A[$2] == "" ? (";;" $1) : (A[$2] ";" $1); next}
 FILENAME == ARGV[3] {A[$1] = A[$2] == "" ? (";;;" $2 ";" $3 ";" $4) : (A[$1] ";" $2 ";" $3 ";" $4)}
 END {for(x in A) print x ";" A[x]}' sampleoutput1.txt sampleoutput3.txt sampleoutput2.txt > sampleoutput1-2-3.txt
1 Like

After your corrections, I noticed there is a comma where there should be a semicolon, also there are spaces around the semicolons. I do not know if they are there in reality, I adjusted FS to compensate. If they are not there you can just use ";" . Please put more care still into your file sample specifications.

Try:

awk '
  BEGIN {
    FS=" *; *" 
    OFS=";"
  } 
  FNR==1{
    c++
  }
  c==1 { 
    A[$1]=$0 
  } 
  c==2 { 
    A[$2]=A[$2] OFS $1
  }
  c==3 { 
    $1=A[$1]
    print
  }
' sampleoutput1.txt sampleoutput3.txt sampleoutput2.txt
1 Like

Very Sorry RudiC , I will take more care next time. Thanks for your help.

---------- Post updated at 08:28 AM ---------- Previous update was at 08:24 AM ----------

Thanks a lot Scrutinizer , the code which you provided worked fine. Also I will ensure more care while posting next time. Could you please help me in explaining the script. Also Please provide some sources/books which can help me in learning to master advanced awk and sed.

---------- Post updated at 10:05 AM ---------- Previous update was at 08:28 AM ----------

Hello SriniShoo , The script which you provided went into long running and eventually i stopped it after approx 15 minutes. Thanks a lot for your efforts

Hello Rahul,

Following may help you in same.

 awk '
  BEGIN {                                                    ####### Start begin statement here.
    FS=" *; *"                                               ####### mentioning field seprator here as "*; *"  means it will take look for all text from starting to till semi colon as mentioned in your input file
    OFS=";"                                                  ####### mentioning output field seprator as semi colon ;
  } 
  FNR==1{                                                    ####### Whenever FNR is equal to 1, this condition will be TRUE whenever a new file read. As FNR will be reset whenever a new file is being read and on other hand NR will be keep on increasing it's value till all files being read.
    c++                                                      ####### incrementing the value of a variable named c by one.
  }
  c==1 {                                                     ####### When variable C's value is 1 this will be TRUE after first file is being read.
    A[$1]=$0                                                 ####### Creating an array named A whose index is $1 and value is $0 (complete line).
  } 
  c==2 {                                                     ####### When variable named C's value is 2 which will be TRUE when second file is being read because when read second file's turn will come then FNR==1 conditon will increase variable named C's value to 1 more which will be 2 now.
    A[$2]=A[$2] OFS $1                                       ####### Taking same array named A this time taking index as $2 because $1 of sampleouput1.txt and $2 of sampleoutput2.txt are same. Assigning the value to A{$2] OFS $1 where OFS is Output fielfd seprator.
  }
  c==3 {                                                     ####### When variable named C 's valie is 3 this will be TRUE when 3rd file is being read.
    $1=A[$1]                                                 ####### now making $1's value as A[$1]; because $1 of file named sampleoutput2.txt and $1 of sampleoutput1.txt and $2 of sampleoutput3.txt are same as per input files provided.
    print                                                    ####### Performing print action now, which will print complete line then.
  }
' sampleoutput1.txt sampleoutput3.txt sampleoutput2.txt      ####### mentioning filenames now.
 

Thanks,
R. Singh

2 Likes

Sorry! That was untested and there was a bug
try below

awk -F ';' '
 FILENAME == ARGV[1] {A[$1] = ($2 ";" $3); next}
 FILENAME == ARGV[2] {A[$2] = A[$2] == "" ? (";;" $1) : (A[$2] ";" $1); next}
 FILENAME == ARGV[3] {A[$1] = A[$1] == "" ? (";;;" $2 ";" $3 ";" $4) : (A[$1] ";" $2 ";" $3 ";" $4)}
 END {for(x in A) print x ";" A[x]}' sampleoutput1.txt sampleoutput3.txt sampleoutput2.txt > sampleoutput1-2-3.txt
1 Like

Thanks Ravinder. Just one correction:

    FS=" *; *"                                               ####### using the field separator (FS) here as " *; *"  means it will consider spaces around the semicolons also part of the field separator (so that these spaces do not becomes part of the fields)
2 Likes

Thanks a lot Ravinder Sir , Scrutinizer and SriniShoo. :slight_smile: