Korn Shell Array maximum value less than other value

ther2000 · June 8, 2011, 9:48am

I have a text file with several key words that I am trying to isolate. I have grepped for the unknowns in the text file, but each unknown has a corresponding location. I have created an array that holds all the unknowns and another array that holds all of the locations and compares them based on the line number in the report. the locations are always before the unknowns, so I compare positions in the array to check that the location line number is less than the unknown line number. However, the problem is there are multiple locations and multiple unknowns, so if the unknown is above several locations it prints all of those locations whereas I want the closes one. Please see the code:

 
typeset -i i=0
typeset -i j=0
grep -h -n "LOCATION" $R_FILE | cut -d ':' -f 1 | while read value
do 
arr="$value"
i=$(( $i + 1))
done

grep -h -n "unknown" $R_FILE | cut -d ':' -f 1 | while read value
do 
uarr[j]="$value"
j=$(( $j + 1))
done
 
for j in ${uarr[@]}; 
do 
         for i in ${arr[@]};
         do
                         if (($j > $i)) 
                               then 
                                      LOC=$(awk "NR==$i" $R_FILE)
                                      UNK=$(awk "NR==$j" $R_FILE)
                                      print $LOC","$UNK | awk -F"," '{print $1","$2","$4","$5","$3}' >> output 
                         fi
            done
done

Output is as follows:

LOCATION: M1,19,8300,unknown,
LOCATION: M1,13,eeee,unknown,eooeo
LOCATION: B,13,eeee,unknown,eooeo
LOCATION: M1,5,TICKLISH,unknown,83838388383
LOCATION: B,5,TICKLISH,unknown,83838388383
LOCATION: CL 1-2,5,TICKLISH,unknown,83838388383
LOCATION: CL 1-1,5,TICKLISH,unknown,83838388383
LOCATION: CL 2-2,5,TICKLISH,unknown,83838388383
LOCATION: CL 2-1,5,TICKLISH,unknown,83838388383

Output should be:

LOCATION: M1,19,8300,unknown,
LOCATION: B,13,eeee,unknown,eooeo
LOCATION: CL 2-1,5,TICKLISH,unknown,83838388383

ctsgnb · June 8, 2011, 10:12am

How does your initial input file look like ?

ther2000 · June 8, 2011, 10:19am

The initial input file looks like the below example. 'line' just means an insignificant line of text. Basically, I am searching for the lines with the keyword 'unknown' in them and their corresponding location, which is the location directly above them. Each unknown should have only one location.

LOCATION: M1
line
line
line
19,8300,unknown,
line
line

LOCATION: B
line
13,eeee,unknown,eooeo
line
line

LOCATION: CL 1-2
line
line
line
line

LOCATION: CL 1-1
line
line
line

LOCATION: CL 2-2
line
line
line

LOCATION: CL 2-1
line
5,TICKLISH,unknown,83838388383
line
line

yazu · June 8, 2011, 10:30am

Perl is ideal for it:

cat FILE | perl -00 -lne '/unknown/ && print' | egrep 'LOCATION|unknown'

---

Better:

cat FILE | perl -00 -lne '/unknown/ && print' | egrep 'LOCATION|unknown' | sed -n '/LOCATION/N;s/\n/,/p'

ther2000 · June 8, 2011, 10:40am

Unfortuneately perl is not an option for this.

yazu · June 8, 2011, 10:55am

What about awk? Or only ksh?

cat testfile | awk  'BEGIN { RS = "\n\n"} /unknown/{print}' | egrep 'LOCATION|unknown' | sed -n '/LOCATION/N;s/\n/,/p'

It's gawk, but I think nawk should work too.

ctsgnb · June 8, 2011, 10:57am

nawk '/^LOCATION:/{x=$0}x&&/unknown/{print x,$0;x=z}' infile

nawk 'BEGIN{RS="";FS="\n"}/unknown/{do{x=$(++i)}while(x!~/unknown/);print $1,x;x=i=z}' infile

$ cat tst
LOCATION: M1
line
line
line
19,8300,unknown,
line
line

LOCATION: B
line
13,eeee,unknown,eooeo
line
line

LOCATION: CL 1-2
line
line
line
line

LOCATION: CL 1-1
line
line
line

LOCATION: CL 2-2
line
line
line

LOCATION: CL 2-1
line
5,TICKLISH,unknown,83838388383
line
line
$ nawk 'BEGIN{RS="";FS="\n"}/unknown/{do{x=$(++i)}while(x!~/unknown/);print $1,x;x=i=z}' tst
LOCATION: M1 19,8300,unknown,
LOCATION: B 13,eeee,unknown,eooeo
LOCATION: CL 2-1 5,TICKLISH,unknown,83838388383
$

Or more simply

nawk '/^LOCATION:/{x=$0}x&&/unknown/{print x,$0;x=z}' tst

$ nawk '/^LOCATION:/{x=$0}x&&/unknown/{print x,$0}' tst
LOCATION: M1 19,8300,unknown,
LOCATION: B 13,eeee,unknown,eooeo
LOCATION: CL 2-1 5,TICKLISH,unknown,83838388383
$

fpmurphy · June 8, 2011, 11:04am

#!/bin/ksh93

while read line
do
   if [[ $line =~ 'LOCATION' ]]; then
     location=$line
   fi
   if [[ $line =~ 'unknown' ]]; then
      echo "$location,$line"
   fi
done < file

ther2000 · June 8, 2011, 11:21am

Thanks everyone for your responses. ctsgnb, your code almost worked perfectly, and it is my fault for not posting the exact file as it is. Your code printed the line after the line that I need, this is because there is a heading above the lines which I failed to include - Sorry! I tried tweaking the code a bit but I do not know nawk well enough to understand exactly what is going on there. Any insight would be appreciated. Thanks again for your help and sorry that my initial input file was not correct:

Date
Report

LOCATION: M1

Headings
line
line
line
19,8300,unknown,
line
line

LOCATION: B

Headings
line
13,eeee,unknown,eooeo
line
line

LOCATION: CL 1-2

Headings
line
line
line
line

LOCATION: CL 1-1

Heading
line
line
line

LOCATION: CL 2-2

Heading
line
line
line

LOCATION: CL 2-1

Heading
line
5,TICKLISH,unknown,83838388383
line
line

ctsgnb · June 8, 2011, 11:28am

the code

nawk '/^LOCATION:/{x=$0}x&&/unknown/{print x,$0;x=z}' infile

should still work ... or do you expect another output ???

ther2000 · June 8, 2011, 3:14pm

Beautiful! Thanks...I hadn't tried that one

---------- Post updated at 03:14 PM ---------- Previous update was at 11:33 AM ----------

Actually, one more quick question, I am trying to import these items to a .csv file which I am doing with a redirection >> .csv
However, I wanted to have a comma between the end of the LOCATION and the first integer in the line with 'unknown'...see below:
LOCATION: M1,19,8300,unknown,
LOCATION: B,13,eeee,unknown,eooeo
LOCATION: CL 2-1,5,TICKLISH,unknown,83838388383

This way I can easily open it in excel and have everything in seperate columns. The problem I am running into is that some of the locations have more fields than others so depending on what I use as the delimiter it seperates the locations.

ctsgnb · June 8, 2011, 3:49pm

Then just enclose the coma in double quote :

nawk '/^LOCATION:/{x=$0}x&&/unknown/{print x","$0;x=z}' infile

ther2000 · June 8, 2011, 4:01pm

Oh okay...thanks!!