array in awk

rocket_dog · May 19, 2011, 11:21am

Hi I am trying to get data from an array and input it into awk. Please see below:

###
#!/bin/bash
#declare array
declare -a ARRAY

exec 10</path/to/arrayfile

let count=0
while read LINE <&10; do

ARRAY[$count]=$LINE
((count++))
done

#close file
exec 10>&-

ENDLOOP=0

i=0 ;
y=$((${#ARRAY[@]} - 1)) ;

until [ $i -gt $y ]  ;

 do
  echo -e "Sorting ${ARRAY} ... " ;
  cat file | awk 'BEGIN { FS= "[\t]" }; { if ($6 = "${ARRAY}" ) print $0}' > /dir/to/${ARRAY}/newfile ;
        ((i++)) ;
    done
###

I am expecting the array to search for the first entry in the array file at a specified field seperated by tabs, then the second, then the third although it does not seem to be working properly and I believe my syntax is incorrect. Any help would be greatly appreciated.

vgersh99 · May 19, 2011, 11:40am

Please provide a sample input file and a desired output (using code tags).

Franklin52 · May 19, 2011, 11:42am

Your approach is very inefficient, is something like this what you're looking for?

awk 'NR==FNR{a[$0]; next} $6 in a' /path/to/arrayfile file > /dir/to/${ARRAY}/newfile

BTW: the shebang (#!/bin/bash) must be one the first line of a shell script.

Klashxx · May 19, 2011, 11:43am

use this line:

cat file | awk 'BEGIN { FS= "[\t]" }; { if ($6 = '"${ARRAY}"' ) print $0}' > /dir/to/${ARRAY}/newfile ;

rocket_dog · May 19, 2011, 1:06pm

See /path/to/arrayfile below

text1
text2
text3

#!/bin/bash
#declare array
declare -a ARRAY

exec 10</path/to/arrayfile

let count=0
while read LINE <&10; do

ARRAY[$count]=$LINE
((count++))
done

#close file
exec 10>&-

ENDLOOP=0

i=0 ;
y=$((${#ARRAY[@]} - 1)) ;

until [ $i -gt $y ]  ;

 do
  echo -e "Sorting ${ARRAY} ... " ;
  cat file | awk 'BEGIN { FS= "[\t]" }; { if ($6 = "${ARRAY}" ) print $0}' > /dir/to/${ARRAY}/newfile ;
        ((i++)) ;
    done

The expected result is that the loop will first check field 6 for the text "text1" and if "text1" is in that field, print the entire line to /dir/to/text1/newfile. Then the same for "text2" and "text3". If I replace ${ARRAY[i]} with text1, as seen below, I get the desired result:

cat file | awk 'BEGIN { FS= "[\t]" }; { if ($6 = "text1" ) print $0}' > /dir/to/${ARRAY}/newfile ;

Franklin52 · May 19, 2011, 1:28pm

Try this:

awk 'NR==FNR{a[$0]; next} $6 in a{print > "/dir/to/" $6 "/newfile"}' /path/to/arrayfile file

rocket_dog · May 19, 2011, 3:33pm

I tried the following with no luck:

awk 'NR==FNR{a[$0]; next} $6 in a{print > "/path/to/${ARRAY}/" $6 "/newfile"}' /path/to/arrayfile file

could there be an issue with the syntax in the way I am using ${ARRAY[i]} as in /path/to/${ARRAY[i]}/ ?

Franklin52 · May 19, 2011, 3:43pm

awk 'NR==FNR{a[$0]; next} $6 in a{print > "/dir/to/" $6 "/newfile"}' /path/to/arrayfile file

It's unnecessary to use the script, the single awk command should do the whole job.

rocket_dog · May 19, 2011, 4:58pm

Maybe I am explaining it wrong. I have a text file that has multiple lines that are all formatted the same and the text is separated by a tab. Below is an example of this text file:

TEXT1A	TEXT2A	TEXT3A	TEXT4A	TEXT5A	TEXT6A	TEXT7A
TEXT1B	TEXT2B	TEXT3B	TEXT4B	TEXT5B	TEXT6B	TEXT7B
TEXT1C	TEXT2C	TEXT3C	TEXT4C	TEXT5C	TEXT6C	TEXT7C

I want to search each line in this file to see if the text field #6 matches the line I am searching for. I want to search from an array file one by one until done. See an example of the array file below:

TEXT6A
TEXT6B
TEXT6C

When a match is found, I want to print the entire line where that match was found to a file in a folder named after the text that I am searching for. For example, I want to write the whole line where the match occurred to /dir/to/TEXT6A/file.

I could not get the following to print the lines where the match occurred:

[quote=franklin52;302523738]

awk 'NR==FNR{a[$0]; next} $6 in a{print > "/dir/to/" $6 "/newfile"}' /path/to/arrayfile file

Please advise.

vgersh99 · May 19, 2011, 5:10pm

how exactly does the solution provided by Franklin52 not solve the issue?
what exactly does not work?

rocket_dog · May 20, 2011, 1:24am

My mistake It works.

---------- Post updated at 09:56 PM ---------- Previous update was at 04:23 PM ----------

Upon further investigation I noticed that I still had the problem when working with real world data although all the tests worked fine.

The code below supplied by Franklin52 is correct although I need one modification

awk 'NR==FNR{a[$0]; next} $6 in a{print > "/dir/to/" $6 "/newfile"}' /path/to/arrayfile file

This code looks for the 6th field separated by spaces and tabs. When working with real world data, some fields contain spaces within the field and when these spaces exist this command doesn't work. I need to know how to modify the above command to look at the 6th field separated by tabs so it does not get confused when a space is introduced within a field.

Test sample file:

TEXT1A	TEXT2A	TEXT3A	TEXT4A
TEXT1B	TEXT2B	TEXT3B	TEXT4B
TEXT1C	TEXT2C	TEXT3C	TEXT4C

Real world file;

TEXT1A	TE XT2A	TEXT3A	TEXT4 A
TEXT1B	TE XT2B	TEXT3B	T EXT4B
TEXT1 C	TEXT2C	TEXT3C	TEXT4C

Thanks in advance.

---------- Post updated 05-20-11 at 12:24 AM ---------- Previous update was 05-19-11 at 09:56 PM ----------

I solved it. See below for the solution:

awk -F'[\t]' 'NR==FNR{a[$0]; next} $6 in a{print > "/dir/to/" $6 "/newfile"}' /path/to/arrayfile file