Gunzip and edit many files

Experts - I have an requirement to gunzip and edit many files in a pair of directories.

I have two scripts that work great when run separately, but I'm having problems
combining the two.

The goal is to gunzip the files found in the first script and pipe them to the
bash/sed script and output to a different directory.

Here is the 1st script:

#!/bin/sh

cd $HOME

find . -type f -mmin -40 -name '*.gz'  -print  | ( while read i
do
        FILE=$(basename "${i}" .gz)
        gunzip -c "${i}" | $HOME/insert.sh > tmp/"${FILE}"
done
)

Here is the ($HOME/insert.sh) bash/sed script:

#!/bin/bash

startTime="$(grep "<Setup" $1)</Setup>"
dn=$(grep -m1 "DN" $1)

sed  -e "/<Target/a\ $dn" -e "/<Target/a \ \ \ \ \ \ $startTime" $1 > $1.chg

Anyone have any ideas on this?

Thanks

Not sure I fully understand what you're after, but two things jump to my eyes :

  • the second script seems to require data from stdin to work ( grep ) upon, but the first redirects its stdout to a file which, then, never appears again.
  • the second script seems to require a parameter which the first doesn't supply when calling.

Thanks for the reply.
Sorry if I wasn't very clear.

I'm trying to gunzip many files and run them through a sed script for further processing before they get written to a file.

The 1st script attempts to do 2 things:'

  1. Find gz files that are 40 minutes old
  2. gunzip the files found in step one and run them through the sed script before they get written to a file

I worked on writing 2 scripts that did this.
Now I want to combine the two scripts so that the files are gunzipped and "edited" before they are written to a file.

Now I have to run the first script to gunzip all these files and then run the second script on those files to edit and rewrite.

I'm trying to only write the files once be combining the two scripts.

Does this make sense?

Thanks

OK, we have understood that. Still, What RudiC has said holds:

gunzip -c "${i}" | $HOME/insert.sh > tmp/"${FILE}"

Here the script $HOME/insert.sh is called, but without any parameter, but here:

startTime="$(grep "<Setup" $1)</Setup>"
dn=$(grep -m1 "DN" $1)

It seems that one parameter to this script is required, no? So the line in script one should look like

gunzip -c "${i}" | $HOME/insert.sh "some-param-here" > tmp/"${FILE}"

and whatever yo put into "some-param-here" will end up where you use "$1" in the insert.sh -script.

Another thing is that obviously insert.sh expects the parameter to be a file(name), because otherwise this line:

sed  -e "/<Target/a\ $dn" -e "/<Target/a \ \ \ \ \ \ $startTime" $1 > $1.chg

wouldn't make any sense at all. But you do not pass a filename to insert.sh , instead you flood its stdin with input. insert.sh has no method to deal with input from stdin , though. It is as if you are writing someone a letter who is only expecting telephone calls. Because he watches the phone the whole day but doesn't check his mailbox you can write as many letters as you want, he won't react.

I hope this helps.

bakunin

3 Likes

I had a misconception in my post - of course insert.sh 's stdin receives data, and its stdout is redirected to the temp file. Sorry!

I think I really over thought this and wasn't very clear in what my requirements were.

I originally had two scripts that worked great, but I needed two cron jobs to do and used more disk space because I gunzipped the files into one location and then edited them with another script into yet another directory.

My goal was to gunzip and edit together to only one directory, combining the two scripts I was using previously.

I got stuck on passing a shell variable from one script and passing it into the sed script.

This is what I came up with, I would appreciate it if anyone has a better way to do this.

#!/bin/sh

cd $HOME

find . -type f -mmin -40 -name '*.gz'  -print  | ( while read i
do
        FILE=$(basename "${i}" .gz)
        startTime="$(zgrep "<Setup" ${i})</Setup>"
        dn=$(zgrep -m1 "DN" ${i})
        gunzip -c "${i}" | sed  -e "/<Target/a\ $dn" -e "/<Target/a \ \ \ \ \ \ $startTime"  > tmp/"${FILE}"
done
)

Thanks again for looking into this!

If you told people what you want to achieve - not what and how you coded - solutions ("better ways") could be thought up. With yout above script, you unzip every .gz file THREE times! And, if there are more than one occurrence of e.g. "DN" in a file, the variable and thus replacement string might not be what you desired. And, if "DN" is missing in a file, the last one's will be used.
If you could make 100% sure "setup..." and "DN" will always precede "<Target" in your files, a very simple awk code could be offered.

1 Like

If you could make 100% sure "setup..." and "DN" will always precede "<Target" in your files, a very simple awk code could be offered.

I can confirm the above.
setup and DN will always preceded '<Target'.

I responded how I did in the hopes that I might help someone else in the future.
I didn't realize how inefficient the code was.

Thanks for pointing that out.

No problem. In fact, these forums appreciate people publishing their attempts so these can be analysed and discussed, so the learning curve for everyone becomes steeper.
Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.

Not knowing anything about your input file nor the desired output, just looking onto your code, I came up with

gunzip -c "${i}" | awk '
/<Setup/        {StartTime = "      " $0
                }
/DN/            {dn        = " " $0
                }
/<Target/       {$0 = $0 ORS dn ORS StartTime
                }
1
' > /tmp/"${FILE}"

Test and come back with results.

1 Like

Yup, below works great.
Thanks for the input and suggestions.

:b::b::b::b: