Replace a multi-line strings or numbers

Hi

I have no experience in Unix so any help would be appreciated

I have the flowing text

235543
123
45654
199
225
578
45654
199
225

I need to find this sequence from A file

45654
199
225

and replaced it with in B file

45654
258

so the new file B will be

235543
123
45654
258
578
45654
258

any help?

Here is a solution using awk:

awk '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == 45654 && A[i+1] == 199 && A[i+2] == 225 )
                        {
                                A[i+1] = 258
                                A[i+2] = 0
                        }
                        if ( A )
                                print A
                }
        }
' file
1 Like

Thanks Yoda

but what if I want to search for a variable sequence instead of known. for example

"variable number" 
199
225

will be

"variable number" 
258

Thanks

I didn't quite understand what you mean by variable sequence.

The program that I posted replaces 45654 199 225 to 45654 258

You just have to modify it as per your requirement.

1 Like

Thanx again Yoda

sorry for being not clear

what I meant was

what if I want to find sequence that followed by "xxx"

XXX
199
225

and replace it with

XXX
258

xxx could be any number between (1 to 260)

so, every time replace all the sequence followed by that XXX

Thank you

So I assume that you are going to define starting sequence in a variable.

In that case you can pass whatever variable to awk, assign it and use it.

You can code something like:

SEQ=XXX

awk -v S="$SEQ" '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == S && A[i+1] == 199 && A[i+2] == 225 )
                        {
                                A[i+1] = 258
                                A[i+2] = 0
                        }
                        if ( A )
                                print A
                }
        }
' file

I used shell variable: SEQ , replace value XXX with the number of your choice. I hope this helps.

Khaled79,
Check this out:

# v=45654;perl -0777  -pe 's/$ENV{v}\n199\n225/$ENV{v}\n258/igs' file
235543
123
45654
258
578
45654
258
1 Like

rveri

it wont work !

 
# v=45654;perl -0777  -pe 's/$ENV{v}\n199\n225/$ENV{v}\n258/igs' file

---------- Post updated at 08:14 PM ---------- Previous update was at 07:45 PM ----------

Yoda

for small files awk working with no problem
but, with large files the awk shows this error

 
awk: cmd. line:3: (FILENAME=a.txt FNR=18498251) fatal: more_nodes: nextfree: can 't allocate 4000 bytes of memory (Cannot allocate memory)
 

and this error too printed in Cygwin terminal

 
line 3: 7488 Aborted (core dumped) awk -v S="$SEQ" '
{
A[++c] = $1
}
END {
for ( i = 1; i <= c; i++ )
{
if ( A == S && A[i+1] == 199 && A[i+2] == 225 )
{
A[i+1] = 258
A[i+2] = 0
}
if ( A )
print A

}
}
' ascii.txt >pre.txt

any help about it?

Thanks a lot

How about this awk code?

SEQ=45654

awk -v S="$SEQ" '
        $0 == S {
                V = $0
                getline
                if ( $0 == 199 )
                {
                        getline
                        if ( $0 == 225 )
                        {
                                print V RS "258"
                                next
                        }
                        else
                        {
                                print V RS "199" RS $0
                                next
                        }
                }
                else
                {
                        print V RS $0
                        next
                }
        }
        $0 != S {
                print $0
        }
' file
1 Like

Dear Yoda

I will test it now for large files I have

Thanks

---------- Post updated at 09:25 PM ---------- Previous update was at 08:58 PM ----------

Thanks Yoda

it works well for large and small files as well

Thank you very much.

could you please tell me what is the different between two codes?

Khaled

The first awk program that I posted loads all the records in your file into an Indexed Array A[++c] = $1 and in the end it performs the required operation.

This caused the program to throw Cannot allocate memory error for large files.

But in the second awk program, entire records are not loaded into any variable or array but instead checking record by record to perform the required operation.

Hence it is not a memory intensive program and works for large files.

1 Like

Yoda

can I make it search for any of tow or more numbers if found then replaced it like for example

 
SEQ=45654 | 234567  |57899 

awk -v S="$SEQ" '
        $0 == S {
                V = $0
                getline
                if ( $0 == 199 )
                {
                        getline
                        if ( $0 == 225 )
                        {
                                print V RS "258"
                                next
                        }
                        else
                        {
                                print V RS "199" RS $0
                                next
                        }
                }
                else
                {
                        print V RS $0
                        next
                }
        }
        $0 != S {
                print $0
        }
' file

Thanks a lot

Khaled

Yes you can. For implementing this change use regular expression comparison operators ~ and !~ instead:

SEQ="45654|234567|57899"

awk -v S="$SEQ" '
        $0 ~ S {
                V = $0
                getline
                if ( $0 == 199 )
                {
                        getline
                        if ( $0 == 225 )
                        {
                                print V RS 258
                                next
                        }
                        else
                        {
                                print V RS "199" RS $0
                                next
                        }
                }
                else
                {
                        print V RS $0
                        next
                }
        }
        $0 !~ S {
                print $0
                F = 0
        }
' file
1 Like

Yoda

I have used this code to print the number that comes before and after specific character

 
SEQ=200
awk -v S="$SEQ" '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == S )
                        {        
              print " the letter is " A   
                                print " followed by " A[i+1] 
                                print " comes after " A[i-1]  
                        }
                        
        
                               
                }
        }
' file

result was like the following

 
the letter is 200
 followed by 202
 comes after 211
 the letter is 200
 followed by 223
 comes after 212
 the letter is 200
 followed by 202
 comes after 211

I need it to print counter of times that happened if it repeated rather than print it many time so the result should be something like this

 
2 times 
the letter is 200
 followed by 202
 comes after 211
 
1 times 
the letter is 200
 followed by 223
 comes after 212
 

how I can do it?

Thanks a lot
Khaled

You can code something like:

SEQ=200
awk -v S="$SEQ" '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == S )
                        {
                                C[A,A[i+1],A[i-1]]++
                                V[A,A[i+1],A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
                        }
                }
                for ( k in V )
                {
                        print C[k] " times"
                        print V[k]
                }
        }
' file
1 Like

Thanks Yoda

You are awesome!

how I can sort the output descending ?

Thanks

By default, the order in which a for (i in array) loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.

So you have to use an Indexed Array to help preserve the order, try this modified code:

SEQ=200
awk -v S="$SEQ" '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == S )
                        {
                                if ( !(V[A A[i+1] A[i-1]]) )
                                        T[++j] = A A[i+1] A[i-1]
                                C[A A[i+1] A[i-1]]++
                                V[A A[i+1] A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
                        }
                }
                for ( i = 1; i <= j; i++ )
                {
                        print C[T] " times"
                        print V[T]
                }
        }
' file

The result wasn't ordered descending based on times

 
25 times
the letter is 200
followed by 202
comes after 211
36 times
the letter is 200
followed by 223
comes after 212

it should print the 36 times result followed by 25 result

Thanks

I'm sorry. I misread your requirement. I thought you want to print records in the order that you have them in your input file.

If you have gawk then you can use below code to print in descending order:

SEQ=200
gawk -v S="$SEQ" '
        {
                A[++c] = $1
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        if ( A == S )
                        {
                                C[A,A[i+1],A[i-1]]++
                                V[A,A[i+1],A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
                        }
                }
                for ( k in V )
                {
                        T[++j] = C[k]
                }
                n = asort(T)
                for ( i = n; i >= 1; i-- )
                {
                        for ( k in V )
                        {
                                if ( C[k] == T )
                                {
                                        print C[k] " times"
                                        print V[k]
                                }
                        }
                }
        }
' file
1 Like

Thanks Yoda
Suppose that I have texts in different languages

how I could print the frequent n words

like frequent 10 or 100 ?

It is easy to do it for A-Z which is English

but, regardless the input language text how I can do it?

Thanks