Comparing multiple lines in same file

seekryts15 · April 7, 2016, 9:16am

Hello,
I would like to write a /bin/ksh script to manipulate a file and compare its contexts. Comparing lines 1 & 2, 3 & 4, 5 & 6, and so forth until the end of the file. This is what I would like the script to compare (using line 1 & 2 as an example):

Verify if the last column in line 1 is numeric. If condition is true move it (15.235.10.21) to the beginning of the line. So it would now display 15.235.10.21 Alabama
If line 1, column 1 is numeric and line 2, column 1 is numeric compare both numbers. If both numbers match move on to the next (lines 3 & 4).
If the numbers do not match display all lines not matching at the end.

Original file contexts:

Alabama 15.235.10.21
15.235.10.21
Petersburg 15.25.18.21
15.25.18.21
Salem 15.235.18.20
15.235.18.20
Tampa 15.235.18.20
15.235.18.20
Washington 15.235.18.21
15.235.18.21
Nova 15.235.18.21
15.234.18.21
Nashville 15.235.18.21
15.235.18.21
Texas 15.235.18.21
15.235.18.21
Burbank 15.235.18.25
15.235.18.25
Carolina 15.235.18.22
15.235.18.22
Seattle 15.235.18.23
15.235.18.23
Wyoming 15.235.18.24
15.235.18.24
Vermont 18.66.20.17
18.66.2.17
New York 13.5.48.2
columbia
Florida 13.7.24.25
13.7.24.25
Chicago 13.17.12.5
uchicago
Nebraska
Tennessee 16.13.3.2
plank
Frisco 15.35.18.1
Japan
Canada
France 18.55.7.25
18.55.7.25

Example script output if numbers dont match:

The following do not match:
15.235.18.21 Nova
15.234.18.21

18.66.20.17 Vermont
18.66.2.17

Please advise, thank you.

RudiC · April 7, 2016, 9:31am

Any attempt/idea/thought from your side?

RudiC · April 7, 2016, 9:42am

Howsoever, assuming awk is acceptable, try:

awk 'NR%2 {getline T; if (T != $2 && $2 T !~ /[^0-9.]/) printf "%s %s\n%s\n", $2, $1, T}' file
15.235.18.21 Nova
15.234.18.21
18.66.20.17 Vermont
18.66.2.17

rdrtx1 · April 7, 2016, 9:57am

try also:

awk '
NF > 1 && $NF ~ /^[0-9.][0-9.]*$/ {lv=$NF; $NF=""; l=lv " " $0; next}
$1 ~ /^[0-9.][0-9.]*$/ { if ($1 != lv) {print l; print $1} }
l=lv="";
' infile

---------- Post updated at 08:57 AM ---------- Previous update was at 08:56 AM ----------

will fail for:

No va 15.235.18.21
15.234.18.21

New York would have failed if a not matching IP address was included in input file given.

RudiC · April 7, 2016, 10:26am

Good point, thanks! Small adaption:

awk 'NR%2 {getline T; X = $NF; sub (X "$", ""); if (T != X && X T !~ /[^0-9.]/) printf "%s %s\n%s\n", X, $0, T}' file
15.235.18.21 No va 
15.234.18.21
18.66.20.17 Vermont 
18.66.2.17

seekryts15 · April 7, 2016, 10:33am

That worked! thank you!

rdrtx1 · April 7, 2016, 10:51am

rudic:

Good point, thanks! Small adaption:

awk 'NR%2 {getline T; X = $NF; sub (X "$", ""); if (T != X && X T !~ /[^0-9.]/) printf "%s %s\n%s\n", X, $0, T}' file
15.235.18.21 No va 
15.234.18.21
18.66.20.17 Vermont 
18.66.2.17

Will fail for input:

Florida 13.7.24.25
13.7.24.25
Chicago 13.17.12.5
uchicago
Nebraska
Tennessee 16.13.3.2
99.13.3.2
Frisco 15.35.18.1
Japan
Canada
France 18.55.7.25
18.55.7.25

Scrutinizer · April 7, 2016, 12:55pm

Try:

awk '$NF~/\.[0-9]+\./{if(NF>1) {p=$2; s=$0; next} if(p && $1!=p) {print s RS $0; p=x}}' file

seekryts15 · April 8, 2016, 7:56am

Thank you again for all the help!

Is there a way to check if there are other duplicate entries in the entire file? For example line 1 and 2 numbers should match but if the same number appears anywhere else send an error message. Any help is greatly appreciated, thank you.

RudiC · April 8, 2016, 8:06am

Try

awk 'NR%2 {getline T;  X = $NF; sub (X "$", ""); if (X T !~ /[^0-9.]/) {if (IP[T]++) print "Dup:", T;  if (T != X) printf "%s %s\n%s\n", X, $0, T}}' file
Dup: 15.235.18.20
15.235.18.21 Nova 
15.234.18.21
Dup: 15.235.18.21
Dup: 15.235.18.21
18.66.20.17 Vermont 
18.66.2.17
Dup: 13.7.24.25
Dup: 18.55.7.25

seekryts15 · April 8, 2016, 8:17am

When I run the command in a small list I get the list of duplicates but if I run it in a larger list (same format) it doesn't return anything.