awk: matching and not matching

Hello all,

simple matching and if not match problem that i can't figure out.

file1
hostname:
30 10 * * * /home/toto/start  PROD instance_name1 -p
00 9 * * * /home/toto/start  PROD instance_name2 -p
15 8 * * * /home/toto/start  PROD instance_name3 -p

hostname2:
00 8 * * * /home/toto/start  PROD instance_name4 -p
10 8 * * * /home/toto/start  PROD instance_name5 -p

hostname3:
45 3 * * * /home/toto/start PROD instance_name7 -p

hostname4:
33 0 * * * /home/toto/start PROD instance_name6

file2
backuphostname:
30 10 * * * /home/toto/start  PROD instance_name1 -b
00 9 * * * /home/toto/start  PROD instance_name2 -b
15 8 * * * /home/toto/start  PROD instance_name3 -b

backuphostname2:
30 15 * * * /home/toto/start  PROD instance_name4 -b
00 10 * * * /home/toto/start  PROD instance_name5 -b

Now the thing is to match the instance name so i can know which host is backup of which host as far as instance_name. That seems to work ok with my code.

What i can't figure out through is the exception... instance_name7 doesn't have a backup in file2.

bash$ awk -f list.awk file1 file2
PROD: hostname2 instance_name4 -p BACKUP: backuphostname2 instance_name4 -b
PROD: hostname3 instance_name7 NO BACKUP

My code now:

BEGIN {
    FS="\n"
    RS=""
}
NR == FNR {
    for (i = 2; i <= NF; i++)
        split($i,prodsle," ")
            prod[$1]=prodsle[8]
            next
}
{
    for (i = 2; i <= NF; i++)
        split($i,backupsle," ")
            backup[$1]=backupsle[8]
            for ( x in prod )
                if ( backupsle[8] == prod[x] ) 
                    printf "%s %s %2s %s %s %s \n",x,prod[x],prodsle[9],$1,backupsle[8],backupsle[9]
}

END {
    print "Done"
}

I don't understand the requirement ...
I suppose it will be easier if you post an example of the desired output and
explain how it differs from the one you're getting.

Right now there is a print only if there is a match. Unmatched items are not output and the way the loop as been created if i just put an else i end up printing all unmatched hosts (not the desired behavior). Thats the part i'm stumbling on.

bash$ awk -f list.awk file1 file2
PROD: hostname2 instance_name4 -p BACKUP: backuphostname2 instance_name4 -b
PROD: hostname3 instance_name7 NO BACKUP

This is what i want. First line is a match with is backup and second line is an unmatched instance_name.

Right now i get only the first line part (for all matching instances). Not the unmatched ones.

I think you are making things much more complicated than they need to be. Here's an example that prints instance name, production host, and backup host and also indicates if there is no backup host.

awk '
    NF < 2 { host = $1; next; } # snag host name from either file

    NR == FNR {             # capture host that each prod runs on
        prod[$8] = host;
        next;
    }

    {                       # capture host that each backup runs on
        back[$8] = host;
        next;
    }

    END {
        for( x in prod )
            printf( "%s production on %s backed up on %s\n", x, prod[x], back[x] == "" ? "NO BACKUP HOST" : back[x] );
    }
' file1 file2

Running it on your sample data yields this:

instance_name1 production on hostname: backed up on backuphostname:
instance_name2 production on hostname: backed up on backuphostname:
instance_name3 production on hostname: backed up on backuphostname:
instance_name4 production on hostname2: backed up on backuphostname2:
instance_name5 production on hostname2: backed up on backuphostname2:
instance_name6 production on hostname4: backed up on NO BACKUP HOST
instance_name7 production on hostname3: backed up on NO BACKUP HOST

May not be exactly what you want, but should give you an idea of how you can organise your code to give you both.

---------- Post updated at 22:14 ---------- Previous update was at 22:02 ----------

This will list organised by hostname in file 1:

awk '
    NF < 2 { host = $1; next; } # snag host name from either file
    NR == FNR { inst[host] = inst[host] $8 " "; next; }
    { back[$8] = host; next; }

    END {
        for( h in inst )
        {
            printf( "host: %s\n", h );
            n = split( inst[h], a, " " );
            for( i = 1; i <= n; i++ )
                printf( "\t%s %s\n", a, back[a] == "" ? "NOT BACKED UP" : "backed up on " back[a] );
            printf( "\n" );
        }
    }
' file1 file2

Output looks like this:

host: hostname:
       instance_name1 backed up on backuphostname:
       instance_name2 backed up on backuphostname:
       instance_name3 backed up on backuphostname:

host: hostname2:
       instance_name4 backed up on backuphostname2:
       instance_name5 backed up on backuphostname2:

host: hostname3:
       instance_name7 NOT BACKED UP

host: hostname4:
       instance_name6 NOT BACKED UP
1 Like

Your code doesn't seem to do what you claimed. I guess you wanted something like the following. It doesn't print exactly the format you wanted, but should be easy to adapt.

$ cat list.awk
BEGIN {
  RS = ""
  FS = "\n"
}
{
  for (i = 2; i <= NF; i++) {
    split($i, a, " ")
    if (NR == FNR)
      h[a[8]] = $1
    else 
      print a[7], $1, a[8], (a[8] in h)? h[a[8]] : "NO"
  }
}
END { print "Done" }

$ awk -f list.awk file2 file1 

Didn't realize that agama has answered, but nevertheless ...

1 Like

Thank your both for the code.... i took agama's code and tried to add some of my own to grab the options (the -p or -b) but i'm surely missing something.

NF < 2 { host = $1; next; } 

NR == FNR {             
        prd_option[ prod[$8] = host ] = $9;
        next;
}

{                       
        bck_option[ back[$8] = host ] = $9;
        next;
}

    END {
        for( x in prod )
            printf( "%s, %s, %s, %s, %s\n", x, prod[x], prd_option[x], back[x] == "" ? "NO BACKUP HOST" : back[x], bck_option[x] );
}

In first loop: since x in prod = instance_name1, my toughts where that prd_option[instance_name1] would equal -p
In last printf: again since i have two arrays (one prod, one backup) i would get something in prd_option[instance_name1] that would equal my -p or -b or nothing if its empty.

I'm surely missing something OR i got this all wrong.... Thanks.