awk help

mavictoro · July 14, 2011, 4:17pm

Hi, I've been trying to do this without awk but it's getting too complicated but I don't know how to use arrays within awk, maybe this is how it could be done best.

I have files something like this that I want to display only lines where the value of the third column is within 2 of each other, in this case I would only want to display lines 3 and 4. Could someone suggest something?

2011-06-25 12:27:59 40 nodea down
2011-06-25 12:28:02 45 nodea up
2011-06-25 12:29:23 70 nodea down
2011-06-25 14:31:14 71 nodea up
2011-06-25 14:31:15 80 nodea down

bartus11 · July 14, 2011, 4:39pm

Try:

awk 'NR==1{x=$3;y=$0;next}x==$3-1{y=y"\n"$0;x=$3;p=1;next}p{print y;p=0;}{x=$3;y=$0}END{if (p){print y}}' file

Shell_Life · July 14, 2011, 5:06pm

The code below works even if there are 3 or more lines with difference less than 3 - see example:

2011-06-25 12:27:59 40 nodea down
2011-06-25 12:28:02 45 nodea up
2011-06-25 12:29:23 70 nodea down
2011-06-25 14:31:14 71 nodea up
2011-06-25 14:31:14 73 nodea up
2011-06-25 14:31:14 75 nodea up
2011-06-25 14:31:15 80 nodea down

#!/usr/bin/ksh
typeset -i mDiff mVal3 mPVal3
mPPrint="First_Time"
while read mVal1 mVal2 mVal3 mVal4 mVal5; do
  if [[ "${mPPrint}" = "First_Time" ]]; then
    mPPrint="N"
  else
    mDiff=${mVal3}-${mPVal3}
    if [[ ${mDiff} -le 2 ]]; then
      echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
      mPPrint="Y"
    else
      if [[ "${mPPrint}" = "Y" ]]; then
        echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
        mPPrint="N"
      fi
    fi
  fi
  mPVal1=${mVal1}
  mPVal2=${mVal2}
  mPVal3=${mVal3}
  mPVal4=${mVal4}
  mPVal5=${mVal5}
done < Input_File
if [[ "${mPPrint}" = "Y" ]]; then
  echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
  mPPrint="N"
fi

bartus11 · July 14, 2011, 5:12pm

shell_life:

The code below works even if there are 3 or more lines with difference less than 3 - see example:

2011-06-25 12:27:59 40 nodea down
2011-06-25 12:28:02 45 nodea up
2011-06-25 12:29:23 70 nodea down
2011-06-25 14:31:14 71 nodea up
2011-06-25 14:31:14 73 nodea up
2011-06-25 14:31:14 75 nodea up
2011-06-25 14:31:15 80 nodea down

#!/usr/bin/ksh
typeset -i mDiff mVal3 mPVal3
mPPrint="First_Time"
while read mVal1 mVal2 mVal3 mVal4 mVal5; do
  if [[ "${mPPrint}" = "First_Time" ]]; then
   mPPrint="N"
  else
   mDiff=${mVal3}-${mPVal3}
   if [[ ${mDiff} -le 2 ]]; then
   echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
   mPPrint="Y"
   else
   if [[ "${mPPrint}" = "Y" ]]; then
   echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
   mPPrint="N"
   fi
   fi
  fi
  mPVal1=${mVal1}
  mPVal2=${mVal2}
  mPVal3=${mVal3}
  mPVal4=${mVal4}
  mPVal5=${mVal5}
done < Input_File
if [[ "${mPPrint}" = "Y" ]]; then
  echo ${mPVal1}' '${mPVal2}' '${mPVal3}' '${mPVal4}' '${mPVal5}
  mPPrint="N"
fi

I think the same behavior can be accomplished by using this:

awk 'NR==1{x=$3;y=$0;next}x==$3-1||x==$3-2{y=y"\n"$0;x=$3;p=1;next}p{print y;p=0;}{x=$3;y=$0}END{if (p){print y}}' file

danmero · July 14, 2011, 6:25pm

awk 'END{if(p)f(y)}function f(y){print y}($3-x)<3{y=y RS$0;x=$3;p=1;next}p{f(y);p=0}{x=$3;y=$0}' file

mavictoro · July 14, 2011, 10:43pm

awk 'NR==1{x=$3;y=$0;next}x==$3-1||x==$3-2{y=y"\n"$0;x=$3;p=1;next}p{print y;p=0;}{x=$3;y=$0}END{if (p){print y}}' file

THis is what I need, thank you bartus11 and everyone who posted replies.
But can you break it down for me a bit? I see the $3-1 OR $3-2 part.

neutronscott · July 14, 2011, 11:53pm

An explanation? I'll dissect it for you.

NR == 1 {
	x=$3;
	y=$0;
	next
}

Init's X (column 3) and Y (entire line) using the first line. Let's go ahead and pretend the first X is 40.

x==$3-1||x==$3-2 {
	y=y"\n"$0;
	x=$3;
	p=1;
	next
}

If our next line is 41, it will satisfy this condition. 40=41-1. So this will only match if the next record is +1 or +2... hmm, Did you need it to match -1 and -2 as well?
It appends the entire line to Y. It sets P, which you can think of as a print tag, but the "next" will end processing of this record now and not yet print it right away (because maybe a 3rd line or more follows within the range...)

p {
	print y;
	p=0;
}

Lets say line 3 in our example (41,40,...) has a value of 70. So the previous block with the X=$3-1 ... is not satisfied, but this one is since we did set the P flag already. We'll print whats in our buffer Y (which are the lines containing 41 and 40), and clear that flag and start anew.

{
	x=$3;
	y=$0
}

Still on our 3rd line of text in my example, X=70, will continue through to here, and be set as the next X to be looked for (in case line 4 is 71 or 72)....

END {
	if (p) {
		print y
	}
}

If we reach the end of the file and have stuff to print, do it now. Otherwise, we only were actually printing our successful matches at the first non-match and it'd be lost.

Hope this helps.

mavictoro · July 15, 2011, 12:04pm

Thanks guys, you guys are great!!!

spl · July 18, 2011, 2:53pm

#!/usr/bin/perl
print "Enter file1 (with path): "; my $file1 = <stdin>;
print "Enter file2 (with path): "; my $file2 = <stdin>;
print "Enter record delimiter : "; my $delim = <stdin>;
print "Enter output file      : "; my $ofile = <stdin>;
chomp($file1); chomp($file2); chomp($delim); chomp($ofile);
die "2 different files are required to compare.\n"
if ($file1 eq '' or $file2 eq '' or $file1 eq $file2);
die "Record delimiter must be specified.\n" if ($delim eq '');
die "Output file must be specified.\n" if ($ofile eq '');
print "\nFile1: " . $file1 . "\nFile2: " . $file2 . "\n";
print "Output: " . $ofile . "\n";
open (f1, $file1) || die ("Failed to open $file1 - $!\n");
open (f2, $file2) || die ("Failed to open $file2 - $!\n");
my @f1c = <f1>; my @f2c = <f2>;
my $f1l = scalar(@f1c); my $f2l = scalar(@f2c);
my $fml = ($f1l >= $f2l) ? $f1l : $f2l;
print ("Total records to compare: " . $fml . "\n");
my %diff_records;
for (my $i = 0; $i < $fml; $i++) {
    my $f1r = $f1c[$i]; my $f2r = $f2c[$i];
    chomp($f1r); chomp($f2r);
    my @f1e = split($delim,$f1r); my @f2e = split($delim,$f2r);
    my $f1el = scalar(@f1e); my $f2el = scalar(@f2e);
    my $fel = ($f1el >= $f2el) ? $f1el : $f2el;
    my @diff_cols;
    for (my $j = 0; $j < $fel; $j++) {
        if ($f1e[$j] ne $f2e[$j]) {
            push (@diff_cols, $j+1);
            $diff_records{$i+1} = \@diff_cols;
        }
    }
}
open (OF, ">>$ofile") || die ("Failed to open $ofile for writing - $!\n");
my @diff_rec = sort(keys(%diff_records));
if (scalar(@diff_rec) >=1) {
    print OF "Mismatch found in records:\n\n";
    foreach my $diff_record (@diff_rec) {
        print OF "Record# $diff_record \n";
        chomp($f1c[$diff_record-1]); chomp($f2c[$diff_record-1]);
        my @diff_col = @{$diff_records{$diff_record}};
        foreach my $col (@diff_col) {
            my @f1el = split($delim,$f1c[$diff_record-1]);
            my @f2el = split($delim,$f2c[$diff_record-1]);
            print OF "  Column# $col \n";
            print OF "  File1: " . $f1el[$col-1] . "\n";
            print OF "  File2: " . $f2el[$col-1] . "\n\n";
        }
    }
} else { print OF "\nAll records match.\n"; }
print "Output recorded in file $ofile \n";
close OF;