Using array in script?

emily · February 1, 2013, 2:12pm

Dear all,

I have following set of code [1] which I want to modify
In the given code, the given PATH535 only accept the single path.
I would like to modify it to make it array and should be able to pass several
of the path address together.

for example like following:

PATH535[1]=/pathA/log/A/
PATH535[2]=/pathA/log/B/
PATH535[3]=/pathA/log/C/

And the code works for all the files kept at these locations of PATH535[] array?

Please help me,
thanks in advance -emily

[1]======================

 
PATH535=/pathA/log/A

doCheck() {
    for FileNameIndx in "${PATH535[@]}"
      do
      if [[ ! -e "dest_path/$FileNameIndx" ]]; then
          ls -ltr "$FileNameIndx" | grep root | awk -F_ '{print $3,$0}' OFS=\t | sort -n | cut -f2- >> $File0"_0"
          #ls -ltr "$FileNameIndx" | grep root | awk '{print string path $9}' string="$CONSTANT" path="$FileNameIndx"  >> "$File0"                    
          sort -nrk5 < $File0"_0" | awk -F_ '!x[$3]++' >> $File0"_1"
          grep -in "vg" $File0"_1" | awk '{print path string $9}' string="/" path="$FileNameIndx" >> $FileName
          echo "$FileNameIndx is copied"
      else
          echo "Check the FileName in ${PATHNAME[@]}"
      fi
      echo "--------"
}

ctsgnb · February 1, 2013, 3:01pm

instead of modifying the function and add a loop in it,
maybe you could just do a loop which call that function.

But if it's just a matter of performance issue that makes you willing modify the code, maybe you should consider redesigning it completly.

I haven't tried to understand what it does exactly, but at a first look, i would say it is far to be optimized

Don_Cragun · February 1, 2013, 8:08pm

I partially agree with ctsgnb, and partly disagree. With all of the things that are not shown to us, I don't know if this script is optimized or extremely inefficient.

Here is your script again (after setting up PATH535 to be the array of 3 pathnames as you indicated you wanted to do earlier in your problem description) with line numbers added for discussion purposes:

 1 PATH535[1]=/pathA/log/A/
 2 PATH535[2]=/pathA/log/B/
 3 PATH535[3]=/pathA/log/C/
 4
 5 doCheck() {
 6     for FileNameIndx in "${PATH535[@]}"
 7       do
 8       if [[ ! -e "dest_path/$FileNameIndx" ]]; then
 9           ls -ltr "$FileNameIndx" | grep root | awk -F_ '{print $3,$0}' OFS=\t | sort -n | cut -f2- >> $File0"_0"
10           #ls -ltr "$FileNameIndx" | grep root | awk '{print string path $9}' string="$CONSTANT" path="$FileNameIndx"  >> "$File0"
11           sort -nrk5 < $File0"_0" | awk -F_ '!x[$3]++' >> $File0"_1"
12           grep -in "vg" $File0"_1" | awk '{print path string $9}' string="/" path="$FileNameIndx" >> $FileName
13           echo "$FileNameIndx is copied"
14       else
15           echo "Check the FileName in ${PATHNAME[@]}"
16       fi
17       echo "--------"
18 }

Since I have no idea what is in your log files, nor how the variables CONSTANT , File0 , and FileName have been initialized before you call your doCheck() function, I can't make any educated guess as to what you're trying to do, but a few things look strange:

I would expect the variable name FileNameIndex to be an index into an array named FileName . But starting on line 6, you use FileNameIndx as the name of a file (of type directory) from the array PATH535 .
On line 8 you check to see if a directory exists in a sub-directory of your current working directory. If it doesn't exist, you perform several operations, but none of them create the directory that you were looking for.
On line 9, OFS=\t sets the output field separator for this awk command to t ; not to a <tab> character. (If you want a <tab> character, change it to OFS='\t' .)
Line 15 seems a bit strange. Why say:
text Check the FileName in list of directory names

when what it really means is:
text Processing skipped: dest_path/$FileNameIndx already exists
The do on line 7 doesn't have a matching done . Presumably the missing done should be added before or after line 17.

emily · February 2, 2013, 7:37am

Hi Don Cragun,
I appreciate your reply and the time you spend on that piece of code.
I completely agree with you and ctsgnb that code is far from optimization.

Being novice with shell scripting, I was happy with the idea that I could make working script which looks nasty but does my task pretty well. And the sole reason to post the question here was try to optimize it. That also make me learn more of shell scripting.

Now, Let me explain you the task of this piece of the script.
I have several of the files kept at some locations like

$ls /path/path_goesTo/A/
vgtree_132_1_vsd.root  vgtree_182_1_asq.root  vgtree_231_1_FWO.root  vgtree_281_1_RV7.root  vgtree_39_1_HCC.root   vgtree_89_1_oaM.root
vgtree_133_1_lDx.root  vgtree_183_1_1lF.root  vgtree_23_1_q4e.root   vgtree_28_1_OkE.root   vgtree_40_1_ROj.root   vgtree_90_1_nfF.root
vgtree_134_1_rtQ.root  vgtree_184_1_JDe.root  vgtree_232_1_rNt.root  vgtree_282_1_sK6.root  vgtree_41_1_1JA.root   vgtree_91_1_9r8.root
vgtree_135_1_U24.root  vgtree_185_1_S7u.root  vgtree_233_1_KrH.root  vgtree_283_1_3Wj.root  vgtree_4_1_YrR.root    vgtree_9_1_NME.root
vgtree_136_1_TrA.root  vgtree_186_1_HwX.root  vgtree_234_1_3pj.root  vgtree_284_1_N3Z.root  vgtree_42_1_DcY.root   vgtree_92_1_JQY.root
vgtree_137_1_yVj.root  vgtree_187_1_cda.root  vgtree_235_1_XrJ.root  vgtree_285_1_z8g.root  vgtree_43_1_msk.root   vgtree_93_1_oOe.root
vgtree_138_1_zOd.root  vgtree_188_1_N2U.root  vgtree_236_1_wpb.root  vgtree_286_1_CLY.root  vgtree_44_1_yDV.root   vgtree_94_1_1YG.root
vgtree_139_1_Gmf.root  vgtree_189_1_ohb.root  vgtree_237_1_PF2.root  vgtree_287_1_3oO.root  vgtree_45_1_ZWL.root   vgtree_95_1_SQx.root

And my purpose is to copy them in one common file named $FileName. Important is to make sure that if there is any repeating of files than only the file with the bigger size should be kept.

Repeating would implies, in the given example the deciding critera is to match the no '187' and if there are three different files, go for copying the file name vgtree_187_1_cda.root

-rw-r--r-- 1 pooja04 us_cms 38939100 Dec 19 21:38 vgtree_187_1_cda.root
-rw-r--r-- 1 pooja04 us_cms 8900 Dec 19 21:38 vgtree_187_2_jdy.root
-rw-r--r-- 1 pooja04 us_cms 939100 Dec 19 21:38 vgtree_187_3_chj.root

Following this criteria I want a final file $DataFile with the content as:

/path/path_goesTo/A/vgtree_95_1_SQx.root
/path/path_goesTo/A/vgtree_96_1_TGx.root
/path/path_goesTo/A/vgtree_97_2_YKx.root
/path/path_goesTo/A/vgtree_98_1_RKx.root

And the following as I explained for single $PATH would infect be repeated for several of PATH and with millions of root files.

I hope, I could clear as what I wanted to do.

Thanks a lot
pooja

ctsgnb · February 3, 2013, 4:05pm

ls -l /path/path_goesTo/A/ | sort -r -k 5n -t_ -k 2n | awk '{split($9,x,".");split(x[1],a,"_");i=a[1]"_"a[2]}!F++{print $9}' | xargs -i cat {} >> yourbigfile

or

...| sort -r -k 5n -t_ -k 2n | awk '{print $NF}' | awk -F"[._]" '{i=$1"_"$2}!F++{print $0}' | xargs -i cat {} >> bigoutput

Depending on the implementation, the "sort" command may need some tweak so that it displays files by number
and if several index (ex: _1 _2 _3) for a same file number (187) it displays the biggest in size at first.

As far as i tested, the output given by the sort command works fine on my Ubuntu.
Once the file are sorted that way, it build the initial base name of a file number (that is contained in i) then print only the first met and add it into your big file.
appending this into the big file could also be added within awk using the "system" command (if so, remember to close each file after concatenation into the big one).

emily · February 9, 2013, 7:16am

thanks ctsgnb for the detailed reply.
I have small doubt, I still need to loop over many nested directories using
array.
Can you help me with that?

thanks again,
emily,