problem to read data in array

patrick87 · December 11, 2009, 4:10am

My input is a long list of data start with "#":

#read_1
123456898787987

#read_2
54645646540646406

#read_3
4654564654316
.
.

I got a bit confusing about how to set all in an array first. And then when I run a program name "statistic_program", it will read the array in scalar and do it one by one continuously.
I got think to use "While Read Array Do Done", just I not sure how to let the "statistic_program", run the input file data one by one.
Below is the perl script that I just try.

#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<INPUT_FILE.txt");
my($file)=<FILE>;
my@lines=split('#',$file);
foreach my $data (@lines){
system "/statistic_program $data >> output.txt"}
exit 0;

Unfortunately, the perl script can't work nice as well
Thanks for any suggestion and advice.

frans · December 11, 2009, 5:35am

Can't the "statistic_program" directly read the input file?

patrick87 · December 11, 2009, 5:41am

Hi frans,
The "statistic_program" only allow to read one sequence each time and give one output.
For example, if my input file got 20 read, I need to repeat do it 20 times and will generate 20 output file. It is very bad
I try to write a script to let the "statistic_program" continuously run each sequence and generate only one output file instead of 20 output file at the end.
Unfortunately, I facing some problem to write the script.
Thus I ask from all of your advice and suggestion.
Thanks first

frans · December 11, 2009, 5:45am

Could you give an example of the desired output (is it only the numbers under the #read_N statement? should the 'N' be the index in the array?)

patrick87 · December 11, 2009, 5:51am

After running the "statistic_program" , it will generate a list of statistic detail about each query sequence. Thus hopefully all the input file run using the "statistic_program".
I only can try to write a script to allow each sequence continuously run one by one by "statistic_program" because I can't edit the statistic_program as well because it is at binary mode
Thanks for your advice

frans · December 11, 2009, 5:57am

It's not what i meant. what is the needed format for input of that binary program?

patrick87 · December 11, 2009, 6:01am

The input format for the binary program just like the input file sequence that I have shown:

#read_1
123456898787987

But the program only able to run one sequence each times so far. Thus need to write a script to let the input file data take like an array to automatic run now.
All the data query sequence beginning with "#"
Thanks.

frans · December 11, 2009, 6:08am

something like

#!/bin/bash
while read L1
do
    if [ "${L1:0:1}" = "#" ]
    then
        read L2
        /statistic_program $L1 $L2
done < input.txt

patrick87 · December 11, 2009, 6:11am

Thanks frans,
Can I ask you what is the meaning of the (L1:0:1) and L1,L2 represent inside your code?
Thanks ya

---------- Post updated at 06:11 AM ---------- Previous update was at 06:11 AM ----------

Thanks frans,
Can I ask you what is the meaning of the (L1:0:1) and L1,L2 represent inside your code?
Thanks ya

frans · December 11, 2009, 6:18am

#!/bin/bash
while read L1 # reads the current line and stores it in L1
do
    if [ "${L1:0:1}" = "#" ] # gives the first character of curent line
    then
    read L2 # reads the next line ans stores it in L2
    /statistic_program $L1 $L2 # runs the program with the values of L1 and L2 as parameters
done < input.txt

patrick87 · December 11, 2009, 6:19am

Hi frans,
I just try your code.
But it won't work
It appears the message like this :
syntax error near unexpected token 'done'

frans · December 11, 2009, 6:26am

apologize, i forgot the fi in if then fi statement !!

#!/bin/bash
while read L1
do
    if [ "${L1:0:1}" = "#" ]
    then
        read L2
        /statistic_program $L1 $L2
    fi
done < input.txt

patrick87 · December 11, 2009, 6:36am

Hi frans,
Thanks for your help and advice.
I very appreciate it
Unfortunately, the code that you suggested still can't take the query sequence continuously read the query one by one
Generally, the statistic_program will take this format as their input file:

#read_1
123456898787987

"#" represents the sequence_header, while the "123456898787987" is the content of the #read_1 that the "statistics_program" will deal with it each times.
sorry if my problem confusing you

frans · December 11, 2009, 6:44am

So, you don't need the header?

#!/bin/bash
I=0
while read L1
do
    if [ "${L1:0:1}" = "#" ]
    then
        read L2
        DATA[$I]=$L2
        (( I++ ))
    fi
done < input.txt
/statistic_program ${DATA[@]}

patrick87 · December 11, 2009, 6:51am

Hi frans,
This times the code showed that:
Wrong argument: L2
Sorry for troubling you

frans · December 11, 2009, 6:57am

I want to run too fast. I've corrected my previous post

patrick87 · December 11, 2009, 6:59am

Hi frans,
Based on my understand about the statistic_program, it will look for "#"first, after then it will manipulate its detail and generate a output file.
Thus now I try to write a script to allow the statistic_program, to run a long list of data beginning with "#" and manipulate its respective detail.
Hope it is more clear about my question.

frans · December 11, 2009, 7:02am

That's why i asked what format of data th eprogram would accept.
is it something like

# 1256958 1564 2356554 5464

patrick87 · December 11, 2009, 7:02am

I just try the edited code as well.
Don't know why end up this problem :
Argument list too long

frans · December 11, 2009, 7:04am

How do you call the script fro the command line?