joining files based on key column

Hi

I have to join two files based on 1st column where 4th column of a2.txt=at and take 2nd column of a1.txt and 3rd column of a2.txt and check against source files ,if matches list those source file names.


a1.txt

a1|20090809|20090810
a2|20090907|20090908

a2.txt

a1|d|file1.txt|at
a1|d|file2.txt|at
a1|d|file3.txt|st
a2|d|file4.txt|st
a2|d|file5.txt|st

I have source files in my dir
file1_20090809.txt
file2_20090809.txt
file3_20090809.txt


I am expecting o/p like that

file1_20090809.txt
file2_20090809.txt

Thanks in advance
Akil

for starters....
nawk -f akil.awk a1.txt a2.txt

akil.awk:

BEGIN {
  FS="|"
}
FNR==NR { f1[$1]=$2; next }
$1 in f1 && $NF == "at" { dot=index($3,"."); print substr($3,1, dot-1) "_" f1[$1] substr($3, dot) }

Use gawk, nawk or /usr/xpg4/bin/awk on Solaris:

awk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 { 
  n = split($NF, t, "."); ext = t[n]
  sub("." t[n], "", $NF)
  fn = $NF sep f1[$1] "." ext
  if (!system("[ -e " fn " ]")) print fn
}' sep="_" a1.txt a2.txt    

Hi ,
Thanks for your prompt reply

Thanks & Regards,
Akil

With my nawk on Solaris 'system' seems to return the status of the 'spawning' of the command, rather than the return status of the command (although the 'man nawk' says otherwise).
I used to check for the file existence with the 'getline':

if ((getline dummy < fn ) >0) { print fn; close(fn) }
$
$ perl -ne 'BEGIN{open(F1,"a1.txt"); while(<F1>){split/\|/; $x{$_[0]}=$_[1]} close F1}
>           chomp; split/\|/;
>           if ($_[3] eq "at") {$_[2] =~ s/(.*)\.(.*)/$1_$x{$_[0]}.$2/; push @f,$_[2]}
>           END {foreach $i (@f) {system "ls $i 2>/dev/null"}}' a2.txt
file1_20090809.txt
file2_20090809.txt
$
$

tyler_durden

With Perl:

perl -le'
    open F1, $ARGV[0] or die "$ARGV[0]: $!";
    %f1 = map { ( split /\|/ )[ 0, 1 ] } <F1>;
    open F2, $ARGV[1] or die "$ARGV[1]: $!";
    while (<F2>) {
        @f2 = split /\|/;
        if ( exists $f1{ $f2[0] } ) {
            $f2[-1] =~ /(.+)(\.[^.\n]+)$/;
            $fn = $1 . "_" . $f1{ $f2[0] } . $2;
            -e $fn and print $fn;
        }
    }' a1.txt a2.txt

---------- Post updated at 05:19 PM ---------- Previous update was at 05:16 PM ----------

I missed the "at" part in both solutions :slight_smile:
Adding it is left as an exercise.

---------- Post updated at 05:24 PM ---------- Previous update was at 05:19 PM ----------

Yep,
it does not work as expected on Solaris (I'll check later).

---------- Post updated at 05:28 PM ---------- Previous update was at 05:24 PM ----------

I think I should post a correct answer later (my both solutions are wrong). Got to go now ...

---------- Post updated at 07:35 PM ---------- Previous update was at 05:28 PM ----------

Actually, if I'm not missing something, this seems to work on my Solaris machine:

[ some old shells like bsh do not support the -e test option, so I changed it to -f ]

% head a*
==> a1.txt <==
a1|20090809|20090810
a2|20090907|20090908

==> a2.txt <==
a1|d|file1.txt|at
a1|d|file6.txt|at
a1|d|file2.txt|at
a1|d|file3.txt|st
a2|d|file4.txt|st
a2|d|file5.txt|st

% uname -rs
SunOS 5.8

% ls -l
total 4
-rw-r--r--   1 drado    sysdba        42 Jul 22 17:08 a1.txt
-rw-r--r--   1 drado    sysdba       108 Jul 22 19:10 a2.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file1_20090809.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file2_20090809.txt
-rw-r--r--   1 drado    sysdba         0 Jul 22 17:10 file3_20090809.txt

% /usr/xpg4/bin/awk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 && /at$/ {
  n = split($(NF - 1), t, "."); ext = t[n]
  sub("." t[n], "", $(NF - 1))
  fn = $(NF - 1) sep f1[$1] "." ext
  if (!system("[ -f " fn " ]")) print fn
}' sep="_" a1.txt a2.txt
file1_20090809.txt
file2_20090809.txt

% nawk -F\| 'NR == FNR { f1[$1] = $2; next }
$1 in f1 && /at$/ {
  n = split($(NF - 1), t, "."); ext = t[n]
  sub("." t[n], "", $(NF - 1))
  fn = $(NF - 1) sep f1[$1] "." ext
  if (!system("[ -f " fn " ]")) print fn
}' sep="_" a1.txt a2.txt
file1_20090809.txt
file2_20090809.txt
% nawk 'BEGIN {
  while (++i < ARGC)
    print ARGV, system("[ -f " ARGV " ]")
        }' file* inexistent
file1_20090809.txt 0
file2_20090809.txt 0
file3_20090809.txt 0
inexistent 1

Modified Perl version:

perl -le'
    open F1, $ARGV[0] or die "$ARGV[0]: $!";
    %f1 = map { ( split /\|/ )[ 0, 1 ] } <F1>;
    open F2, $ARGV[1] or die "$ARGV[1]: $!";
    while (<F2>) {
        @f2 = split /\|/;
        if ( exists $f1{ $f2[0] } && /at$/ ) {
            $f2[-2] =~ /(.+)(\.[^.]+)$/;
            $fn = $1 . "_" . $f1{ $f2[0] } . $2;
            -e $fn and print $fn;
        }
    }' a1.txt a2.txt

I agree with vgersh99 that it's better to do the test inside awk without spawning an external command,
but I'd like to point out the that approach assumes that the files are readable..

you're right - it works as expected with Solaris' 'nawk' - I did test it correctly.
The thing is that the nawk's "system" invokes sh (Bourne) and 'test -e' doesn't exist for Bourne shell - see 'man test':

          -e file
                True if file exists. (Not available in sh.)

so one has to use '-f' - as you did:

$ ls
zin.txt
$ nawk -v file=zin.txt 'BEGIN {print file, system("[ -f " file " ]")}' < /dev/null
zin.txt 0
$ nawk -v file=zin5.txt 'BEGIN {print file, system("[ -f " file " ]")}' < /dev/null
zin5.txt 1

Yes, I changed it after I realized that (I got a syntax error).

Another version with Perl:

perl -F'\|' -lane'
    $f1{ $F[0] } = $F[1] and next if @F == 3;
    if ( /at$/ && exists $f1{ $F[0] } ) {
        $F[-2] =~ /(.+)(\.[^.]+)$/;
        $fn = $1 . "_" . $f1{ $F[0] } . $2;
        -e $fn and print $fn;
    }' a1.txt a2.txt

Hi ,
Please help on the below scnerio,I have take 1st column from a.txt and compare against b.txt ,if matches ,display the below o/p

a.txt
20080710|20080711

b.txt

20070708
20070709
20090710
20090711
20090712
20090713

Expecting o/p

20090710
20090711
20090712
20090713

Thanks &Regards,
Akil