awk program to join 2 fields of different files

Hello Friends,
I just need a small help, I need an awk program which can join 2 fields of different files which are having one common field into one file.

File - 1
FileName~Size

File- 2
FileName~Date

I need the output file in the following way
O/P- File
FileName~Date~Size

For this req, do the files have to be sorted?,
The files are as huge as 10 million lines.
Need your help,

Thanks in Advance
Regards,
Abhishek S.

This will work if files are sorted by the first column(file name):

$ cat t
a~12345
b~54321
c~47789

$ cat t2
a~10/01/2012
b~10/02/2012
c~10/03/2012

$ join -t'~' t t2
a~12345~10/01/2012
b~54321~10/02/2012
c~47789~10/03/2012
1 Like

Hi,
Thanks a lot for the reply. Am aware of the join command, but for that the file has to be sorted. So if I sort a 10 million file its breaking in between saying that there is not much space left.

If this function possible through awk ?

The logic is simple in awk (whether the input files are sorted or not):

awk 'BEGIN {FS = OFS = "~"}
FNR == NR {s[$1] = $2; next}
        {print $1, $2, s[$1]}' in1 in2

but I make no guarantee that awk won't run out of memory for files this large.
If a line in the second file doesn't have a match in the first file, a record will be printed with the 3rd field empty. It would also be possible to add a couple of statements to print any lines that appear in the 1st input file that don't contain a matching line in the 2nd input file, but I didn't bother since you have implied that there are always matching lines in the two input files.

try:

sort -u dates_file sizes_file | awk -F"~" '
{
  if($1==ln){
    fe=0;
    if ($2 ~ "/") {
     cd=$2; cs=$3;
    } else {
     cd=$3; cs=$2;
    }
    if (cd !~ /./) cd=ld;
    if (cs !~ /./) cs=ls;
    if (ln ~ /./) {
      print ln "~" cd "~" cs;
    }
  } else {
    if (ln ~ /./) {
      if (fe==1) {
        print ll;
      }
    }
    fe=1;
  }
}
{
  ll=$0; ln=$1;
  if ($2 ~ "/") {
   ld=$2; ls=$3;
  } else {
   ld=$3; ls=$2;
  }
}
END{
  if (fe=1) print ll;
}
' > new_file