Running program and output files in specific directories

kristinu · February 16, 2018, 10:23pm

I have been running a program mseed2sac using the following command

cd IV
find . -type f -exec /swadmin/mseed2sac '{}' \;

The problem is that I end up with a lot of files in directory IV.
Instead I would like to select the designator HHZ, create a
directory IV.SAC and all the files output for mseed2sac
get stored there

The two directories IV and IV.SAC would be at the same level

IV.SAC
 PTCC
     HHZ.D
        IV.PTCC..HHZ.D.2016.001.SAC
        IV.PTCC..HHZ.D.2016.002.SAC
        IV.PTCC..HHZ.D.2016.003.SAC
        IV.PTCC..HHZ.D.2016.004.SAC
        IV.PTCC..HHZ.D.2016.005.SAC
        IV.PTCC..HHZ.D.2016.006.SAC
 RAFF
     HHZ.D
        IV.RAFF..HHZ.D.2016.001.SAC
        IV.RAFF..HHZ.D.2016.002.SAC
        IV.RAFF..HHZ.D.2016.003.SAC
        IV.RAFF..HHZ.D.2016.004.SAC
        IV.RAFF..HHZ.D.2016.005.SAC
        IV.RAFF..HHZ.D.2016.006.SAC

The original files would be in directory IV

 IV
    PTCC
       BHE.D
       BHN.D
       BHZ.D
       HHE.D
       HHN.D
       HHZ.D
           IV.PTCC..HHZ.D.2016.001
           IV.PTCC..HHZ.D.2016.002
           IV.PTCC..HHZ.D.2016.003
           IV.PTCC..HHZ.D.2016.004
           IV.PTCC..HHZ.D.2016.005
           IV.PTCC..HHZ.D.2016.006
       LHE.D
       LHN.D
       LHZ.D
    RAFF
       BHE.D
       BHN.D
       BHZ.D
       HHE.D
       HHN.D
       HHZ.D
           IV.RAFF..HHZ.D.2016.001
           IV.RAFF..HHZ.D.2016.002
           IV.RAFF..HHZ.D.2016.003
           IV.RAFF..HHZ.D.2016.004
           IV.RAFF..HHZ.D.2016.005
           IV.RAFF..HHZ.D.2016.006
    RESU
       HHZ.D
           IV.RESU..HHZ.D.2016.001
           IV.RESU..HHZ.D.2016.002
           IV.RESU..HHZ.D.2016.003
           IV.RESU..HHZ.D.2016.004
           IV.RESU..HHZ.D.2016.005
           IV.RESU..HHZ.D.2016.006

Don_Cragun · February 17, 2018, 12:10am

As always:

What operating system are you using?
What shell are you using?
What have you tried to solve this problem on your own?

One might guess that you will need to modify /swadmin/mseed2sac :

Why haven't you shown us the man page for this utility?
Why haven't you shown us the source code for this utility?

kristinu · February 17, 2018, 9:37am

I am using Trisquel and writing a script in bash. I do not want to modify the program mseed2sac as I think it would be too much work and so arrange the file structure in a bash script.

For example when I do

 mseed2sac favr/hhz.d/iv.favr..hhz.d.2016.226
Wrote 8639767 samples to IV.FAVR..HHZ.D.2016.226.000003.SAC

I get the file IV.FAVR..HHZ.D.2016.226.000003.SAC at the location where I have run the program
mseed2sac.

/home/hagbard/swadmin/mseed2sac/mseed2sac -H
mseed2sac version: 2.2

Convert miniSEED data to SAC

Usage: mseed2sac [options] input1.mseed [input2.mseed ...]

 ## Options ##
 -V             Report program version
 -h             Show this usage message
 -H             Print an extended usage message
 -v             Be more verbose, multiple flags can be used
 -O             Overwrite existing output files, default creates new file names

 -k lat/lon     Specify station coordinates as 'Latitude/Longitude' in degrees
 -m metafile    File containing channel metadata (coordinates and more)
 -M metaline    Channel metadata, same format as lines in metafile
 -msi           Convert component inclination/dip from SEED to SAC convention
 -E event       Specify event parameters as 'Time[/Lat][/Lon][/Depth][/Name]'
                  e.g. '2006,123,15:27:08.7/-20.33/-174.03/65.5/Tonga'
 -l selectfile  Read a list of selections from file, used for subsetting

 -f format      Specify SAC file format (default is 2:binary):
                  1=alpha, 2=binary (host byte order),
                  3=binary (little-endian), 4=binary (big-endian)

 More options are available, to see their description use the -H option

 -N network     Specify the network code, overrides any value in the SEED
 -S station     Specify the station code, overrides any value in the SEED
 -L location    Specify the location code, overrides any value in the SEED
 -C channel     Specify the channel code, overrides any value in the SEED
 -r bytes       Specify SEED record length in bytes, autodetected by default
 -i             Process each input file individually instead of merged
 -ic            Process each channel individually, data should be well ordered
 -dr            Use the sampling rate derived from the time stamps instead
                  of the sample rate denoted in the input data
 -z zipfile     Write all SAC files to a ZIP archive, use '-' for stdout
 -z0 zipfile    Same as -z but do not compress archive entries

---------- Post updated at 09:18 AM ---------- Previous update was at 05:23 AM ----------

Have been trying to create a strong with the new directory name
using rename but it is not working very well.

./read-seed.sh iv/resu/hhz.d/iv.resu..hhz.d.2016.008
regex: 's/hhz/hhz.sac/g'
dir: iv/resu/hhz.d
odir:

stn="HHZ"
f="$1"
dir=$(dirname "${f}")

regex="'s/${stn}/${stn}.sac/g'"

odir=`echo "$dir" | rename $regex`

---------- Post updated at 09:37 AM ---------- Previous update was at 09:18 AM ----------

Now I have fixed it using sed

r=`echo "iv/resu/hhz.d/" | sed -e 's#hhz.d#hhz.d.sac#'`

Don_Cragun · February 17, 2018, 10:05pm

I'm very glad that you solved your problem.

We would have loved to have been able to help you, but with the information you provided, we weren't able to do much. I asked for the man page your mseed2sec . Instead of that you supplied us with a help message from that utility. Unfortunately, the help message says absolutely nothing about what the purpose of the utility is, what it will do with the input files, what, if any, output files it will produce, ...

And, then you told us that you wanted to invoke it using the format:

mseed2sac file1 file2 file2

which seemed very strange since there is no indication that feeding one file ( file2 in this case) to the utility twice in one invocation would serve any useful purpose. But, you did later change this line to again just pass one pathname to your utility (without explaining why you made that change).

kristinu · February 18, 2018, 1:07pm

You pass it a list of files in seed format (Standard for the Exchange of Earthquake Data)
and outputs a set of files in sac format (a format used by the Seismic Analysis Code SAC).

For example, running mseed2sac as follows
mseed2sac iv/cagr/hhz.d/iv.cagr..hhz.d.2016.208

The program then creates a number of files (in the directory where mseed2sac is run)
as shown in the list below. I want then to move to a new directory, for example to
`iv.sac/cagr/hhz.d/`

.d means day. So for 2016.208 means file for day 208.

iv.cagr..hhz.d.2016.208.000001.sac  iv.cagr..hhz.d.2016.208.043041.sac  iv.cagr..hhz.d.2016.208.070027.sac  iv.cagr..hhz.d.2016.208.121238.sac
iv.cagr..hhz.d.2016.208.131218.sac  iv.cagr..hhz.d.2016.208.132855.sac  iv.cagr..hhz.d.2016.208.132938.sac  iv.cagr..hhz.d.2016.208.133025.sac
iv.cagr..hhz.d.2016.208.133143.sac  iv.cagr..hhz.d.2016.208.133225.sac  iv.cagr..hhz.d.2016.208.133324.sac  iv.cagr..hhz.d.2016.208.133329.sac
iv.cagr..hhz.d.2016.208.133416.sac  iv.cagr..hhz.d.2016.208.133444.sac  iv.cagr..hhz.d.2016.208.133516.sac  iv.cagr..hhz.d.2016.208.133554.sac
iv.cagr..hhz.d.2016.208.133558.sac  iv.cagr..hhz.d.2016.208.133735.sac  iv.cagr..hhz.d.2016.208.133908.sac  iv.cagr..hhz.d.2016.208.133923.sac
iv.cagr..hhz.d.2016.208.134135.sac  iv.cagr..hhz.d.2016.208.134323.sac  iv.cagr..hhz.d.2016.208.134343.sac  iv.cagr..hhz.d.2016.208.134437.sac
iv.cagr..hhz.d.2016.208.134514.sac  iv.cagr..hhz.d.2016.208.134536.sac  iv.cagr..hhz.d.2016.208.134716.sac  iv.cagr..hhz.d.2016.208.134943.sac
iv.cagr..hhz.d.2016.208.135034.sac  iv.cagr..hhz.d.2016.208.135144.sac  iv.cagr..hhz.d.2016.208.135243.sac  iv.cagr..hhz.d.2016.208.135323.sac
iv.cagr..hhz.d.2016.208.135423.sac  iv.cagr..hhz.d.2016.208.135443.sac  iv.cagr..hhz.d.2016.208.135523.sac  iv.cagr..hhz.d.2016.208.135536.sac
iv.cagr..hhz.d.2016.208.135543.sac  iv.cagr..hhz.d.2016.208.135643.sac  iv.cagr..hhz.d.2016.208.135723.sac  iv.cagr..hhz.d.2016.208.135834.sac
iv.cagr..hhz.d.2016.208.135916.sac  iv.cagr..hhz.d.2016.208.140033.sac  iv.cagr..hhz.d.2016.208.140136.sac  iv.cagr..hhz.d.2016.208.140237.sac
iv.cagr..hhz.d.2016.208.140305.sac  iv.cagr..hhz.d.2016.208.140337.sac  iv.cagr..hhz.d.2016.208.140356.sac  iv.cagr..hhz.d.2016.208.140456.sac
iv.cagr..hhz.d.2016.208.140525.sac  iv.cagr..hhz.d.2016.208.140703.sac  iv.cagr..hhz.d.2016.208.140727.sac  iv.cagr..hhz.d.2016.208.140813.sac
iv.cagr..hhz.d.2016.208.140924.sac  iv.cagr..hhz.d.2016.208.140943.sac  iv.cagr..hhz.d.2016.208.141043.sac  iv.cagr..hhz.d.2016.208.141106.sac
iv.cagr..hhz.d.2016.208.141123.sac  iv.cagr..hhz.d.2016.208.141157.sac  iv.cagr..hhz.d.2016.208.141233.sac  iv.cagr..hhz.d.2016.208.141344.sac
iv.cagr..hhz.d.2016.208.141737.sac  iv.cagr..hhz.d.2016.208.141743.sac  iv.cagr..hhz.d.2016.208.141756.sac  iv.cagr..hhz.d.2016.208.141933.sac
iv.cagr..hhz.d.2016.208.141944.sac  iv.cagr..hhz.d.2016.208.142038.sac  iv.cagr..hhz.d.2016.208.142135.sac  iv.cagr..hhz.d.2016.208.142138.sac
iv.cagr..hhz.d.2016.208.142223.sac  iv.cagr..hhz.d.2016.208.142457.sac  iv.cagr..hhz.d.2016.208.142517.sac  iv.cagr..hhz.d.2016.208.142537.sac
iv.cagr..hhz.d.2016.208.142543.sac  iv.cagr..hhz.d.2016.208.142637.sac  iv.cagr..hhz.d.2016.208.142716.sac  iv.cagr..hhz.d.2016.208.142736.sac
iv.cagr..hhz.d.2016.208.142838.sac  iv.cagr..hhz.d.2016.208.142938.sac  iv.cagr..hhz.d.2016.208.142944.sac  iv.cagr..hhz.d.2016.208.143035.sac
iv.cagr..hhz.d.2016.208.143137.sac  iv.cagr..hhz.d.2016.208.143224.sac  iv.cagr..hhz.d.2016.208.144159.sac  iv.cagr..hhz.d.2016.208.144215.sac
iv.cagr..hhz.d.2016.208.144324.sac  iv.cagr..hhz.d.2016.208.144337.sac  iv.cagr..hhz.d.2016.208.144355.sac  iv.cagr..hhz.d.2016.208.144446.sac
iv.cagr..hhz.d.2016.208.144537.sac  iv.cagr..hhz.d.2016.208.144743.sac  iv.cagr..hhz.d.2016.208.144845.sac  iv.cagr..hhz.d.2016.208.144956.sac
iv.cagr..hhz.d.2016.208.145137.sac  iv.cagr..hhz.d.2016.208.145337.sac  iv.cagr..hhz.d.2016.208.145355.sac  iv.cagr..hhz.d.2016.208.145443.sac
iv.cagr..hhz.d.2016.208.145537.sac  iv.cagr..hhz.d.2016.208.145553.sac  iv.cagr..hhz.d.2016.208.145715.sac  iv.cagr..hhz.d.2016.208.145757.sac
iv.cagr..hhz.d.2016.208.150043.sac  iv.cagr..hhz.d.2016.208.150143.sac  iv.cagr..hhz.d.2016.208.150236.sac  iv.cagr..hhz.d.2016.208.150344.sac
iv.cagr..hhz.d.2016.208.150454.sac  iv.cagr..hhz.d.2016.208.150544.sac  iv.cagr..hhz.d.2016.208.150736.sac  iv.cagr..hhz.d.2016.208.150826.sac
iv.cagr..hhz.d.2016.208.151114.sac  iv.cagr..hhz.d.2016.208.151206.sac  iv.cagr..hhz.d.2016.208.151354.sac  iv.cagr..hhz.d.2016.208.151445.sac
iv.cagr..hhz.d.2016.208.151605.sac  iv.cagr..hhz.d.2016.208.151623.sac  iv.cagr..hhz.d.2016.208.151837.sac  iv.cagr..hhz.d.2016.208.152444.sac
iv.cagr..hhz.d.2016.208.153233.sac  iv.cagr..hhz.d.2016.208.153738.sac  iv.cagr..hhz.d.2016.208.154623.sac  iv.cagr..hhz.d.2016.208.155053.sac
iv.cagr..hhz.d.2016.208.155145.sac  iv.cagr..hhz.d.2016.208.155733.sac  iv.cagr..hhz.d.2016.208.155745.sac  iv.cagr..hhz.d.2016.208.155853.sac
iv.cagr..hhz.d.2016.208.155937.sac  iv.cagr..hhz.d.2016.208.161544.sac  iv.cagr..hhz.d.2016.208.164058.sac  iv.cagr..hhz.d.2016.208.164637.sac
iv.cagr..hhz.d.2016.208.164705.sac  iv.cagr..hhz.d.2016.208.164726.sac  iv.cagr..hhz.d.2016.208.165706.sac  iv.cagr..hhz.d.2016.208.171016.sac
iv.cagr..hhz.d.2016.208.171135.sac  iv.cagr..hhz.d.2016.208.171513.sac  iv.cagr..hhz.d.2016.208.171556.sac  iv.cagr..hhz.d.2016.208.171723.sac
iv.cagr..hhz.d.2016.208.172005.sac  iv.cagr..hhz.d.2016.208.172044.sac  iv.cagr..hhz.d.2016.208.172438.sac  iv.cagr..hhz.d.2016.208.180648.sac
iv.cagr..hhz.d.2016.208.192841.sac

---------- Post updated at 01:07 PM ---------- Previous update was at 11:26 AM ----------

Now I have updated the code as follows which transfer the generated files
to the correct directory. Any improvements or possible problems with my
code would be very welcome.

# Counts the number of files to process
totfcn=$(find . -type f | tee /tmp/wrk | wc -l)

i=0; j=0
while read fn; do    
  printf -v XXX "%0*d" $((60 * ++i / totfcn))
  printf "\r[%-60s]" "${XXX//0/*}"
  printf "%4d%%" $((100 * ++j / totfcn))
  #${dir_mseed2sac}/mseed2sac $fn
done < /tmp/wrk
printf "\n"

while read fn; do    
  #${dir_mseed2sac}/mseed2sac $fn
  dir=$(dirname "$fn")  # Gets directory path
  fnm=$(basename "$fn") # Gets filename excl. path
  rgx_nwk="s/${nwk}/${nwk}.sac/g"
  odir_nwk=`echo "$dir" | sed -e $rgx_nwk`
  ofl_nwk="${odir_nwk}/${fnm}"

  echo "fn: $fn"
  if [ -d "$odir_nwk" ]; then
    echo "Directory already exists: $odir_nwk"
  else
    echo "+ dir: $dir"
    echo "+ fnm: $fnm"
    echo "+ mkdir -p $odir_nwk"
  fi
  echo "+ mv ${fnm}.* ${odir_nwk}/"
done < /tmp/wrk

RudiC · February 18, 2018, 1:46pm

Why don't you move each result file right after having created it in the first loop?

Please be aware that the nwk variable used in 11 places is not defined anywhere...

kristinu · February 18, 2018, 2:24pm

Yes, I had that for test. The latest version is now. Naturally I will remove all the print
values and just keep the progress bar.

nwk="iv"
incl_nm="*hhz*"
# Counts the number of files to process
totfcn=$(find . -type f -name $incl_nm | tee /tmp/wrk | wc -l)

i=0; j=0
while read fn; do    

  dir=$(dirname "$fn")  # Gets directory path
  fnm=$(basename "$fn") # Gets filename excl. path
  rgx_nwk="s/${nwk}/${nwk}.sac/g"
  odir_nwk=`echo "$dir" | sed -e $rgx_nwk`
  ofl_nwk="${odir_nwk}/${fnm}"

  echo -e "\n\nfn: $fn"
  if [ -d "$odir_nwk" ]; then
    echo "Directory already exists: $odir_nwk"
  else
    echo "+ dir: $dir"
    echo "+ fnm: $fnm"
    echo "+ mkdir -p $odir_nwk"
  fi

  echo "+ mseed2sac $fn"
  #echo "${dir_mseed2sac}/mseed2sac $fn"
  echo "+ mv ${fnm}.* ${odir_nwk}/"

  printf -v XXX "%0*d" $((60 * ++i / totfcn))
  printf "\r[%-60s]" "${XXX//0/*}"
  printf "%4d%%" $((100 * ++j / totfcn))

done < /tmp/wrk
printf "\n"

Don_Cragun · February 18, 2018, 2:41pm

Have you considered using an absolute pathname on the find (i.e. find "$PWD" ... instead of find . ... ) and moving to the directory in which you want the files to be created before invoking mseed2sac instead of moving all of the files mseed2sac creates after it is done?

Your script will run faster and use fewer system resources if you change:

  dir=$(dirname "$fn")  # Gets directory path
  fnm=$(basename "$fn") # Gets filename excl. path

to:

  dir=${fn%/*}  # Gets directory path
  fnm=${fn##*/} # Gets filename excl. path

And, I assume that you will also actually invoke mseed2sac and mkdir instead of just including them in comments and echo statements. (This also applies to mv if you decide to ignore my first suggestion above.)

kristinu · February 18, 2018, 5:55pm

That's correct. I am just doing echo to check the commands created are correct so
when I happy with how things are set up, I can then invoke them properly.

---------- Post updated at 05:12 PM ---------- Previous update was at 04:24 PM ----------

Should I do `cd` to the directory so that I can run `mseed2sac`? I am not following
very well how to use `find $pwd`

I give here some of the output. There are lot of stations

ls iv/
agst  cavt  cmpr  dgi   esln  gib   hlni  lado  mesg  mmgo  mpnc  mtgr   plac  raff  soi  alja  
cgl   corl  ecnv  esml  gmb   hmdc  lpdg  meu   mno   mrlc  noci   plln  resu  solun  vent
cafe  clta  crac  emsg  favr  haga  hpac  ltrz  mfnl  mpaz  msfr  nov    psb1  scte  ssy
cagr  cmdo  crja  epzf  galf  havl  hvzn  mct   milz  mpg   msru  petra  ptcc  sers

ls iv/favr/
bhe.d/ bhn.d/ bhz.d/ hhe.d/ hhn.d/ hhz.d/ lhe.d/ lhn.d/ lhz.d/ vhe.d/ vhn.d/ vhz.d/

fn: ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.141
+ dir: ./iv/ptcc/hhz.d
+ fnm: iv.ptcc..hhz.d.2016.141
+ mkdir -p ./iv.sac/ptcc/hhz.d
cd ./iv/ptcc/hhz.d
+ /home/hagbard/swadmin/swbuild/mseed2sac/mseed2sac ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.141
+ mv iv.ptcc..hhz.d.2016.141.* ./iv.sac/ptcc/hhz.d/
[*                                                           ]   1%

fn: ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.021
+ dir: ./iv/ptcc/hhz.d
+ fnm: iv.ptcc..hhz.d.2016.021
+ mkdir -p ./iv.sac/ptcc/hhz.d
cd ./iv/ptcc/hhz.d
+ /home/hagbard/swadmin/swbuild/mseed2sac/mseed2sac ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.021
+ mv iv.ptcc..hhz.d.2016.021.* ./iv.sac/ptcc/hhz.d/
[*                                                           ]   1%

fn: ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.350
+ dir: ./iv/ptcc/hhz.d
+ fnm: iv.ptcc..hhz.d.2016.350
+ mkdir -p ./iv.sac/ptcc/hhz.d
cd ./iv/ptcc/hhz.d
+ /home/hagbard/swadmin/swbuild/mseed2sac/mseed2sac ./iv/ptcc/hhz.d/iv.ptcc..hhz.d.2016.350
+ mv iv.ptcc..hhz.d.2016.350.* ./iv.sac/ptcc/hhz.d/
[*                                                           ]   1%

---------- Post updated at 05:55 PM ---------- Previous update was at 05:12 PM ----------

don cragun:

Have you considered using an absolute pathname on the find (i.e. find "$PWD" ... instead of find . ... ) and moving to the directory in which you want the files to be created before invoking mseed2sac instead of moving all of the files mseed2sac creates after it is done?

Your script will run faster and use fewer system resources if you change:
  dir=$(dirname "$fn")  # Gets directory path
  fnm=$(basename "$fn") # Gets filename excl. path
to:
  dir=${fn%/*}  # Gets directory path
  fnm=${fn##*/} # Gets filename excl. path
And, I assume that you will also actually invoke mseed2sac and mkdir instead of just including them in comments and echo statements. (This also applies to mv if you decide to ignore my first suggestion above.)

I get the same results

dir=$(dirname "$fn")
fnm=$(basename "$fn")

as doing

dir=${fn%/*}
fnm=${fn##*/}

How is one faster than the latter? What does `dirname` and `basename` work?

I did a test and variable substitution is faster as you say.

Don_Cragun · February 18, 2018, 6:18pm

First, I suggested using find $PWD ... ; not find $pwd ... . Your shell sets PWD to an absolute pathname of your current working directory when it starts up and updates it every time you successfully execute a cd command. (When you successfully execute cd , your shell also sets OLDPWD to an absolute pathname of the directory you were in before you executed cd .)

There is no reason to think that the value assigned to the variable pwd will have any value assigned to it unless your script does that. And $pwd will not be updated by your shell when you execute cd !

Your code does the following:

read a relative pathname of an input file to be processed from /tmp/wrk ,
runs mseed2sac ,
calculates the directory where the output file(s) should be located,
create that directory (if it doesn't already exist), and
move the output file(s) to that directory.

My suggestion is to change that to:

read an absolute pathname of an input file to be processed from /tmp/wrk ,
calculate the directory where the output file(s) should be located,
create that directory (if it doesn't already exist),
cd to that directory, and
run mseed2sac .

Note that no mv is included in the above suggestion.

Don_Cragun · February 18, 2018, 6:31pm

kristinu:

... ... ...

I get the same results
dir=$(dirname "$fn")
fnm=$(basename "$fn") 
as doing
dir=${fn%/*}
fnm=${fn##*/} 
How is one faster than the latter? What does `dirname` and `basename` work?

I did a test and variable substitution is faster as you say.

Your code uses command substitution (i.e., $(command arguments) ). That involves forking a shell, executing command, waiting for command to finish, reading the results from command, and assigning them to a variable.

My code uses variable expansion which is done entirely in the shell. The fork and exec done by command substitution is a very slow shell operation. The string manipulations done by variable expansions are much faster.

kristinu · February 19, 2018, 6:51am

Right, `dirname` and `basename` involve executing a system program.

---------- Post updated 02-19-18 at 06:51 AM ---------- Previous update was 02-18-18 at 06:57 PM ----------

Have made the changes and understand now what you meant when using `PWD` and `OLDPWD`.