Comparing multiple lines in same file

Hello,
I would like to write a /bin/ksh script to manipulate a file and compare its contexts. Comparing lines 1 & 2, 3 & 4, 5 & 6, and so forth until the end of the file. This is what I would like the script to compare (using line 1 & 2 as an example):

  1. Verify if the last column in line 1 is numeric. If condition is true move it (15.235.10.21) to the beginning of the line. So it would now display 15.235.10.21 Alabama
  2. If line 1, column 1 is numeric and line 2, column 1 is numeric compare both numbers. If both numbers match move on to the next (lines 3 & 4).
  3. If the numbers do not match display all lines not matching at the end.

Original file contexts:

Alabama 15.235.10.21
15.235.10.21
Petersburg 15.25.18.21
15.25.18.21
Salem 15.235.18.20
15.235.18.20
Tampa 15.235.18.20
15.235.18.20
Washington 15.235.18.21
15.235.18.21
Nova 15.235.18.21
15.234.18.21
Nashville 15.235.18.21
15.235.18.21
Texas 15.235.18.21
15.235.18.21
Burbank 15.235.18.25
15.235.18.25
Carolina 15.235.18.22
15.235.18.22
Seattle 15.235.18.23
15.235.18.23
Wyoming 15.235.18.24
15.235.18.24
Vermont 18.66.20.17
18.66.2.17
New York 13.5.48.2
columbia
Florida 13.7.24.25
13.7.24.25
Chicago 13.17.12.5
uchicago
Nebraska
Tennessee 16.13.3.2
plank
Frisco 15.35.18.1
Japan
Canada
France 18.55.7.25
18.55.7.25

Example script output if numbers dont match:

The following do not match:
15.235.18.21 Nova
15.234.18.21

18.66.20.17 Vermont
18.66.2.17

Please advise, thank you.

Any attempt/idea/thought from your side?

Howsoever, assuming awk is acceptable, try:

awk 'NR%2 {getline T; if (T != $2 && $2 T !~ /[^0-9.]/) printf "%s %s\n%s\n", $2, $1, T}' file
15.235.18.21 Nova
15.234.18.21
18.66.20.17 Vermont
18.66.2.17
1 Like

try also:

awk '
NF > 1 && $NF ~ /^[0-9.][0-9.]*$/ {lv=$NF; $NF=""; l=lv " " $0; next}
$1 ~ /^[0-9.][0-9.]*$/ { if ($1 != lv) {print l; print $1} }
l=lv="";
' infile

---------- Post updated at 08:57 AM ---------- Previous update was at 08:56 AM ----------

will fail for:

No va 15.235.18.21
15.234.18.21

New York would have failed if a not matching IP address was included in input file given.

1 Like

Good point, thanks! Small adaption:

awk 'NR%2 {getline T; X = $NF; sub (X "$", ""); if (T != X && X T !~ /[^0-9.]/) printf "%s %s\n%s\n", X, $0, T}' file
15.235.18.21 No va 
15.234.18.21
18.66.20.17 Vermont 
18.66.2.17
1 Like

That worked! thank you!

Will fail for input:

Florida 13.7.24.25
13.7.24.25
Chicago 13.17.12.5
uchicago
Nebraska
Tennessee 16.13.3.2
99.13.3.2
Frisco 15.35.18.1
Japan
Canada
France 18.55.7.25
18.55.7.25
1 Like

Try:

awk '$NF~/\.[0-9]+\./{if(NF>1) {p=$2; s=$0; next} if(p && $1!=p) {print s RS $0; p=x}}' file
1 Like

Thank you again for all the help!

Is there a way to check if there are other duplicate entries in the entire file? For example line 1 and 2 numbers should match but if the same number appears anywhere else send an error message. Any help is greatly appreciated, thank you.

Try

awk 'NR%2 {getline T;  X = $NF; sub (X "$", ""); if (X T !~ /[^0-9.]/) {if (IP[T]++) print "Dup:", T;  if (T != X) printf "%s %s\n%s\n", X, $0, T}}' file
Dup: 15.235.18.20
15.235.18.21 Nova 
15.234.18.21
Dup: 15.235.18.21
Dup: 15.235.18.21
18.66.20.17 Vermont 
18.66.2.17
Dup: 13.7.24.25
Dup: 18.55.7.25

When I run the command in a small list I get the list of duplicates but if I run it in a larger list (same format) it doesn't return anything.