Replacing a character string in a file

rjsha1 · December 7, 2005, 10:48am

Hi there,
I have a paramater file that looks like this :-

IRL|07122005|27389|VTIEpay|email address|5|200

When my program finishes I want to replace the seventh field. the existing code is like this

cat <<-EOF | ed -s $PARFILE
1,$ g/^$ICO/s/$prvdate/$TODAY/
1,$ g/^$ICO/s/$prvsize/$cursize/
1,$ g/^$ICO/s/$prvamt/$payin/
w $PARFILE
q
EOF

the problem that I'm having is that the code replaces the FIRST instance that of the string i.e. $payin is replacing $prvamt which is the 7th field but it is not - it is replacing the matching string in the third field. I am tearing my hair out.

The file should look like

IRL|07122005|27389|VTIEpay|email address|5|2336590

Can anyone give me a clue what to do ?????

tmarikle · December 7, 2005, 11:55am

For starters, I wouldn't use ed; I would rather use awk and utilize its considerable power to make changes to this file. You can pass your variables, search for patterns, and manipulate fields rather than whole lines.

For example:

nawk -F\| -v ICO=IRL -v prvamt=200 -v payin=999 '
    BEGIN {OFS="|"}
    $1 == ICO { 
         if ($7 == prvamt) $7=payin
    }
    1
' $PARFILE

This will make your job much easier.

output:

IRL|07122005|27389|VTIEpay|email address|5|999

rjsha1 · December 8, 2005, 4:54am

Hi there,
That's great - if I wanted to replace fields 2 and three would the code look like this

$1=ICO {
if ($7 == prvamt) $7=payin,$2=cursize,$3=chksum)
}
1

What is the 1 for .

Thanks for your help

bakunin · December 8, 2005, 6:59am

Using awk instead of sed is like using theft to earn money instead of work: perhaps easier to do but coming with a price. In the case of awk the price is performance and size: awk takes a substantially longer time to load compared to sed or ed and does its job at a considerably slower pace.

Both these considerations may or may not be relevant to your problem at hand - if you call awk or sed once you won't notice the difference, if you call it in the middle of a deeply nested loop using sed instead of awk might save a considerable amount of time. Similarly, if your input file is some 100 lines long you won't notice the different speed of operation, if it is a database output with some million of lines to process you might perceive a considerable difference.

Having said this: the real distinguishing point between sed and awk as a text processor is that awk is able to work with a persistent context, whereas seds capabilities in this area are limited to non-existent. If you - for instance - would have to sum one field to a total you would do it with awk (it would be possible to do it with sed, but would be a nightmare - poorly suited tool for the job), in your case it is just a matter of formulating correct regexps and nothing else. I will explain the following solution, which changes two fields (4 and 5) step by step so you can modify it to suit your needs:

sed '/^'"$ICO"'|/ {
                    s/\(\([^|]*|\)\{3\}\)'"$oldvar1"'/\1'"$newvar1"'/
                    s/\(\([^|]*|\)\{4\}\)'"$oldvar2"'/\1'"$newvar2"'/
                  }'

Your input is organized in fields separated by pipes, so a field is "some non-pipe characters followed by a pipe". The regexp to match such a string is: "[^|]|". Then there is a construction to "multiply" regexps: "\{<nr>\}", which means "the rexep before <n> times". For instance "a\{5\}" is the same as "aaaaa". I combined these two by grouping the "field regexp" and then multiplying it to match exactly a specific number of fields: "$[^|]|$\{3\}" means: "3 fields of non-pipe-chars each followed by a pipe". I grouped this by another "$...$" to be able to use it in the replacement string. So, the search string is "three fields followed by the content of oldvar1", which will be replaced by "three fields followed by the content of newvar1". Notice that in oder to change the n-th field we have to mention the first n-1 fields, followed by the search pattern.

This is repeated a second time for the fifth field in my example to show the way of changing multiple fields at once.

At last the surrounding construction: this limits the whole change process to lines with the first field being the content of the shell variable ICO.

At last an observation: the seventh field you wanted to change is the last one in the sample line you provided. This *could* be matched my "[^|]*$", which means "any number of non-pipe characters followed by the end-of-line", but that would imply that your lines can only have seven fields. Using the expressions i supplied there is no such restriction and you can adjust the expression to match any field (save for the first, where the expression becomes simply "^").

Hope this helps.

bakunin

rjsha1 · December 8, 2005, 7:01am

Fantastic post thanks very much for taking the time to write it.

I'm digesting your post now.

thanks
bob...

tmarikle · December 8, 2005, 12:53pm

How funny is that? Using theft over legitimate work as an analogy to awk vs. sed. Let's extend that analogy and suggest that he use machine code over C or better, that he use a custom C application rather than a shell script for performance sake .

[philosophical rambling]
While I agree that unnecessary clock cycles are generally bad when multiplied by being deeply nested in a loop, in my opinion, awk's simplicity coupled with its considerable power simply has more weight especially considering that the OP clearly has to overcome two things: (a) wade through learning counter intuitive UNIX utilities and (b) solving a real business need in light of (a).

Both awk and sed are difficult to grasp at first but I think that awk offers, for your average UNIX user, a simpler path to solving a greater number of text processing problems (whether persistence is necessary or not) for the simple reason that awk can be utilized and understood by a greater number of people; especially those maintaining established code. Constructing moderately complex regular expressions is easier than reading them and remembering what the were supposed to match (at least for me); awk can sometimes insulate you (or a maintainer) from struggling to remember your code. How many people do you know use sed beyond sed 's/search/replace/' or awk beyond awk '{print $4}'? I still find people who have years of UNIX experience creating the ugliest grep filter pipes imaginable on a per line basis. In all of my years working in many different shops, I don't find too many people who go very far beyond these examples.
[/philosophical rambling]

The answer to your first question is �that depends.� If $2 and $3 should be changed based on $7's evaluation, then yes. awk's syntax is like C so your statement would be written like this (notice the curly braces):

if ($7 == prvamt) {
    $7=payin
    $2=cursize
    $3=chksum
}

The answer to your second question is more complicated. An awk script is comprised of autonomous procedures that function on essentially lines of input. My example consists of two autonomous procedures, one that tests whether the current input line has field seven ($7) matching the variable that I passed in "prvamt". The second procedure (the 1 by itself) is very misleading and I should not have used it for your benefit. Each autonomous procedure can result in a 0 or non 0 result and, in my constant non 0 result, I am telling awk to effectively print the modified line of input (changed by my first procedure where $1 was tested against ICO). I should have written it as this:


nawk -F\| -v ICO=IRL -v prvamt=200 -v payin=999 '
    BEGIN {OFS="|"}
    $1 == ICO { 
         if ($7 == prvamt) {
             $7=payin
         }
    }
    { print }
' $PARFILE

I hope this helps and, by all means, write it in machine code if you want performance gains over shell scipting.

Seriously though, bakunin's comments have merit and there are applications for both; you'll want to learn both utilities. Certainly there are many ways to achieve your goal and neither sed nor awk are your only choices. Many people prefer perl to solve these kinds of problems and some ever prefer ruby, which seems less intuitive. I don't write many perl scripts since I haven't been limited by sed or awk as yet.

rjsha1 · December 9, 2005, 3:32am

Hi tmarikle,
thanks for your help

rjsha1 · December 12, 2005, 6:46am

hi chaps,
Where am I going wrong below is my code after your comments :-

nawk -F\| -v CTY=$ICO \
-v oldamt=$prvamt \
-v newamt=$payin \
-v size=$cursize \
-v newday=$TODAY
'BEGIN {OFS="|"}
if ($1 == CTY) {
if ($7 == oldamt) {
$7 == newamt
$2 == newday
$3 == size
}
}
{print} ' $PARFILE

But it sill does not change for file below when $1 = IRL. Any ideas what I'm doing wrong

IRL|08122005|50935|VTIEpay|xxxxx|5|2005331
EGR|01012003|3333|EEDEpay|xxxxxxx|7|900
BEL|21072004|720981|VTBEpay|xxxxxxxx|8|2000
EEA|22077994|200|EEATpay|xxxxxxx|9|500

tmarikle · December 12, 2005, 11:49am

You have some minor syntax errors. Changes are in red:


nawk -F\| \
    -v CTY=$ICO \
    -v oldamt=$prvamt \
    -v newamt=$payin \
    -v size=$cursize \
    -v newday=$TODAY \ <== Missing line continuation
    'BEGIN {OFS="|"}
     $1 == CTY { <<= This marks the start of an autonomous procedure; "if ()" is given
            if ($7 == oldamt) {
                $7 = newamt <== "=" is assignment operator here; "==" is test operator
                $2 = newday
                $3 = size
        }                                                          
     }                                                    
{print} ' $PARFILE

rjsha1 · December 14, 2005, 9:43am

Hi there,
I have changed my code to this :-
nawk -F\| \ -v CTY=$ICO \
-v oldamt=$prvamt \
-v newamt=$payin \
-v size=$cursize \
-v newday=$TODAY \
'BEGIN {OFS="|"}
$1 == CTY {
if ($7 == oldamt) {
$7 = newamt
$2 = newday
$3 = newsum
}
}
{print} ' $PARFILE

I now get the errors :-

nawk: can't open file source line number 1
/u01/TEST1/app/geneva/bin/autoAccpay.sh[610]: -v: not found

Soryy to be a pain chaps - what have I missed I have checked and PARFILE does exist where it should be

tmarikle · December 14, 2005, 11:58am

This backslash is a problem:

nawk -F\| \ -v CTY=$ICO \
...

rjsha1 · December 14, 2005, 12:36pm

Ok,
Got rid of the slash - now have two error messages left
/u01/TEST1/app/geneva/bin/autoAccpay.sh[610]: -v: not found
/u01/TEST1/app/geneva/bin/autoAccpay.sh[613]: -v: not found
this relates to these lines of code

nawk -F\| -v CTY=$ICO \
-v oldamt=$prvamt \ -v newamt=$payin \
-v size=$cursize \
-v newday=$TODAY \ 'BEGIN {OFS="|"}
$1 == CTY {
if ($7 == oldamt) {
$7 = newamt
$2 = newday
$3 = newsum
}
}
{print} ' $PARFILE

I'm pulling my hair out - is it an obvius problem because it is still not updating the $PARFILE.

Sorry for bing a pain

tmarikle · December 14, 2005, 12:56pm

Sorry, I only caught the first mistake. your other two lines are the same.

-v oldamt=$prvamt \ -v newamt=$payin \
-v size=$cursize \
-v newday=$TODAY \ 'BEGIN {OFS="|"}

There isn't any need for these slashes unless you are moving the text following to a new line.

-v oldamt=$prvamt \
-v newamt=$payin \
-v size=$cursize \
-v newday=$TODAY \
'BEGIN {OFS="|"}
...

They are really here in your script to make your command more readable (slashes are also used to "escape" special characters that will be misinterpreted as is the case with your -F parameter). One more point, don't put any spaces after the slash when it's used to continue a command on a new line as this will cause the shell to believe that you want to escape a space; the rest of your command will be ignored.

Note: The -F parameter uses a slash to change how the verticle bar is understood by the shell. Normally it's used as a pipe, now you are telling awk that it is a field separator.

ayan153 · April 11, 2009, 1:07pm

Reallly appreciate your good work. Keep it up.