extract data in a csv file based on a certain field.

I have a csv file that I need to extract some data from depending on another field after reading info from another text file.

The text file would say have 592560 in it.

The csv file may have some data like so

Field 1       Field2  Field3   Field4   Field5     Field6

20009756   1        2         1          1           592560

The information will always be in the same position.

If the data in the text file is on line 6 then output the data in line 2 to another text file. There would be many rows and the correct line 2 would need to be copied. There could be many entires in that text file and could reside in the csv file and would need to be added to the output file.

You mean column?

Let me try, although I'm not sure I understood you completely:

$ cat data
Field 1       Field2  Field3   Field4   Field5     Field6

20009756   1        2         1          1           592560
20007654   2        4         45brt      ewfw        55552
20009756   X        2         1          1           592560
20009756   balh        2         1          1           592560
20007654   8.t        4         45brt      ewfw        55552
$ cat aux
592560
55552
$ awk 'NR==FNR{a[$1]=cnt++; next}{if($6 in a){print $2 > "output."$6;} }' aux data
$ cat output.55552 
2
8.t
$ cat output.592560 
1
X
balh

The above awk command will print the second column of a line whose sixth field matches an entry in aux, and print it into a file.

$ $ cat data
Field 1       Field2  Field3   Field4   Field5     Field6
20009756   1        2         1          1           592560
20007654   2        4         45brt      ewfw        55552
20009756   X        2         1          1           592560
20009756   balh        2         1          1           592560
20007654   8.t        4         45brt      ewfw        55552


$ awk '{print >$6 ".txt"}' infile

$ cat 592560.txt
20009756   1        2         1          1           592560
20009756   X        2         1          1           592560
20009756   balh        2         1          1           592560

or only print column 2 :

$awk '{print $2 >$6 ".txt"}' infile

$ cat 592560.txt
1
X
balh

This worked

awk 'NR==FNR{a[$1]=cnt++; next}{if($6 in a){print $2 > "output.txt";} }' aux data

This put all of the data into one file called output.txt. The other way created a separate file for each entry.

In any case that was a huge help rdcwayx.

There are some other things I need to try to do but this gets me far.

Thanks again!

---------- Post updated at 08:57 PM ---------- Previous update was at 08:32 PM ----------

update

awk -F't\' 'NR==FNR{a[$1]=cnt++; next}{if($6 in a){print $2 > "output.txt";} }' aux data

That was what worked. Needed the -F'\t' so awk knew it was tab delimited.

Have to pad $2 to account for 5 characters and name the output.txt the name of the data file.

Thanks again.

---------- Post updated at 09:46 PM ---------- Previous update was at 08:57 PM ----------

and this

awk -F'\t' 'NR==FNR{a[$1]=cnt++; next}{if($12 in a){printf "%05s\n", $2 > "output.txt";} }' aux data

does the padding of the $2 field.

Anyone know how to pad the output.txt so that it has 14 characters. Padding that with 0's

awk -F'\t' 'NR==FNR{a[$1]=cnt++; next}{if($12 in a){printf "%05s\n", $2 > $1".pull";} }' /aiw/aiw1/pullfiles/*.txt *.dpf

Thanks in advance!

I figured out a way to do what I needed in padding 0's.

I do have another question

I need to search in all txt files in a location. But awk only searches the first one and stops.

How do I search *.txt in a location in awk and still have the same results.

Any help would be appreciated.

Of course it does:

awk -F'\t' 'NR==FNR{a[$1]=cnt++; next}{if($12 in a){printf "%05s\n", $2 > $1".pull";} }' /aiw/aiw1/pullfiles/*.txt *.dpf

The 'NR==FNR' bit is what tells awk to remember pattern ('a[$1]=cnt++') only from the first file.

You could do:

awk -F'\t' 'FILENAME~".txt$"{a[$1]=cnt++; next}{if($12 in a){printf "%05s\n", $2 > $1".pull";} }' /aiw/aiw1/pullfiles/*.txt *.dpf

Which instructs awk to apply the first action to *.txt files.

That does not give me an output file.

Yes is did. Sorry.

Thanks!

This would pad that file name with the 0's. Sorry I did not post this before.

ls *.pull | awk -F'.pull' '{ printf "%s %014s.pull%s\n", $0, $1, $2; }' | xargs -n 2 mv

so this worked until today.

What happened was when the filed has no data in it the command pulls the data. I do not need it to do that.

Here is the line

awk -F'\t' 'FILENAME~".txt$"{a[$1]=cnt++; next}{if(($12 in a) || ($13 in a)){printf "%05s\n", $2 > $8 ".pull"} }'

Here is a sample data that is complet with no empty fields in question.

5122927 and 0 are the two fields I look at. 12 and 13

10129953 155 1 155 1 386618 1346 10129953.1 155 00:00.0 85248 5122927 0 FirstName LastName 111 Blah DRIVE  SUN LAKES AZ 85248  9700485     155 1 1 155

Here is a line where the 13th field (0) is empty.

10129953 156 1 156 1 387964 1255 10129953.1 156 00:00.0 5478 5122928  FirstName LastName 111 Blah MAIN ST APT 4  SAN ALBANS VT 5478  9700486     156 1 1 156

I would like the script to just ignore the blank field.

Thanks in advance!

---------- Post updated at 10:17 PM ---------- Previous update was at 09:40 PM ----------

I would not be opposed to adding a 0 in a blank $13 field.