inserting and replacing lines with awk

Hello,

I need to insert varying lines (i.e. these lines are an output of another script) between lines starting with certain fields.
An example to make it more clear.

This is the file where I wanna insert lines:
(save it as "input.txt")

ContrInMi_c_mir 	 2 	 10066	181014 	 200750
ContrMidl_y_cut 	 2 	 12345	201085 	 220988
ContrMiRi_c_mir 	 2 	 13206	221155 	 240891
ContrIdex_p_mir 	 1 	 765	241225 	 260962
ContrMidl_b_cut 	 1 	 2686	261296 	 281032

This is the file ("insert.txt") containing the lines that should be inserted:

texta: ContrInMi_c_mir
textb: ContrMidl_y_cut
textins: sec_buttonpress 	 3 	 15147
texta: ContrIdex_p_mir
textb: ContrMidl_b_cut
textins: sec_buttonpress 	 2 	 14605

And this is the script which calls at the end the .awk file "dothis.awk"

#!/bin/bash
declare -a Arraytexta
declare -a Arraytextb
declare -a Arraytextins
nrLines=`awk 'END {print NR}' insert.txt`
nrLines=`expr ${nrLines} - 2`

for ((i=1; i<=$nrLines; i++)); 
do 
countb=`expr ${i} + 1`
countins=`expr ${i} + 2`
Arraytexta=`sed -n "${i} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
Arraytextb=`sed -n "${countb} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
Arraytextins=`sed -n "${countins} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
gawk -v textins="${Arraytextins}" -v texta="${Arraytexta}" -v textb="${Arraytextb}" -f dothis.awk input.txt" > output.txt
done

Here is dothis.awk

BEGIN{
	flg=0
}
{
	if (flg==1 && $1==textb || $1=="sec_buttonpress"){
		if($1==textb  || $1=="sec_buttonpress"){
			print textins
		}
	}
	if ($1==texta){
		flg=1
	}
	print $0
}
END{
	
}

So, basically what I try to do is: I search for the lines starting with "texta" and check if the following line starts either with "textb" or - if there has been a line previously inserted (which can happen) - if this starts with "sec_buttonpress". If I found these lines then the "textins" should inserted.
Actually, an extension to cover all needs would be to check also if the second field in the file "input.txt" contains "1". If so then the line (only fields 2 and 3) to be inserted after should actually replace the fields 2 and 3 of this line.

For some reason, "dothis.awk" is not working: it inserts at the end of teh "input.txt" file always the same 2 fields: containing a 6digit number and a "0".

What am I doing wrong?

Thanks a lot in advance for help and advices!!!

tempestas

Hi,

I don't know what is wrong, but:

1.- Which file is untitled3.txt?
2.- Try comment a little bit your code to know what you want to do.
3.- Try one awk program instead using several 'sed' & 'awk' commands.
4.- Input file of 'dothis.awk' is 'input.txt' while in the 'awk' program you
search 'texta' and 'textb' fields which are from 'insert.txt' file. Is this
correct?
5.- Paste the output.txt you are looking for.

Regards,
Birei

Hi birei,

thanks a lot for your comments! Sorry, if it's not really clear what I meant. I'll try to improve that:

Sorry, my mistake: it should be called "insert.txt". I changed that in my first post.

Here the code commented:

#!/bin/bash
### ok, this is only for creating the example files and try to make it look like the original. 
### My original code is much longer since I process also some other parts of the original input.
declare -a Arraytexta
declare -a Arraytextb
declare -a Arraytextins
nrLines=`awk 'END {print NR}' insert.txt`
nrLines=`expr ${nrLines} - 2`

for ((i=1; i<=$nrLines; i++)); 
do 
countb=`expr ${i} + 1`
countins=`expr ${i} + 2`
### Following, I create the arrays that contain the informations that are needed for the "gawk" command:
### First, the pattern from the first line of the match.
Arraytexta=`sed -n "${i} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
### Second, the pattern with which the following line should start.
Arraytextb=`sed -n "${countb} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
### Third, the text that should be inserted.
Arraytextins=`sed -n "${countins} p" insert.txt | awk 'BEGIN{FS=":"}{print $2}'`
### Here, I'll try to pass on the informations from the arrays above into the "dothis.awk" file.
gawk -v textins="${Arraytextins}" -v texta="${Arraytexta}" -v textb="${Arraytextb}" -f dothis.awk input.txt > output.txt
done

Sorry, but I am a beginner with awk. So some things seem for me at this point easier or more familiar with sed. :o

Yes, that's correct.

Here is an example for the "output.txt" file.

ContrInMi_c_mir 	 2 	 10066	181014 	 200750
sec_buttonpress 	 3 	 15147
ContrMidl_y_cut 	 2 	 12345	201085 	 220988
ContrMiRi_c_mir 	 2 	 13206	221155 	 240891
ContrIdex_p_mir 	 1 	 765	241225 	 260962
sec_buttonpress 	 2 	 14605
ContrMidl_b_cut 	 1 	 2686	261296 	 281032

It would be even better if lines that contain in $2 the number "1", can be replaced, i.e. only $2 and $3 should be replaced. Like in this case:

instead of:

ContrIdex_p_mir 	 1 	 765	241225 	 260962
sec_buttonpress 	 2 	 14605

rather:

ContrIdex_p_mir 	 2 	 14605	241225 	 260962	 	 

I hope it becomes more clear what I would like to do.

Thanks a lot for help!

tempestas

The example you posted are very long and the lines does not correspond between input file, replacement file , and output file (it looks as if they were screwed).
I mean i don't see the logic that should be followed.

I think you should post a shorter an cleaner example of infile, insert file and output expected so we could

1) quickly understand the logic of formatting
2) provide you a more accurate answer

Hi,

sorry, I thought the longer examples help to better understand the structure, because the input files are a mess, but they are given to me like that.
I shortened the examples in my former posts. I hope now it becomes more clear what I wanna do.

Thanks a lot!

tempestas

$ cat f1
ContrInMi_c_mir          2       10066  181014   200750
ContrMidl_y_cut          2       12345  201085   220988
ContrMiRi_c_mir          2       13206  221155   240891
ContrIdex_p_mir          1       765    241225   260962
ContrMidl_b_cut          1       2686   261296   281032
$ cat f2
texta: ContrInMi_c_mir
textb: ContrMidl_y_cut
textins: sec_buttonpress         3       15147
texta: ContrIdex_p_mir
textb: ContrMidl_b_cut
textins: sec_buttonpress         2       14605
$ cat myawk
BEGIN{i=1}
NR==FNR{p[NR]=$1;a[NR]=$0;next}
/^texta/{x=$2}
x&&/^textins/{sub(".*"$3,$3);y=$0}!y{next}
{do {
        if(p!=x){
                print a
        }else{
                split(a,b)
                print p" \t "y"\t"b[4]" \t "b[5]
                x=y=z;delete b;i++;next
        }
        delete a;i++
}while(length(a))
}END{do{print a;i++}while(length(a))}
$ awk -f myawk f1 f2
ContrInMi_c_mir          3       15147  181014   200750
ContrMidl_y_cut          2       12345  201085   220988
ContrMiRi_c_mir          2       13206  221155   240891
ContrIdex_p_mir          2       14605  241225   260962
ContrMidl_b_cut          1       2686   261296   281032
$

Hi ctsgnb,

thanks a lot for your effort! The code seems quite complicated. Actually, I do not understand what is done. Can you explain it a little bit so that I learn something?
And one question: the reason why I explicitly marked not only "texta" but the following line as "textb" is that "texta" and also "textb" can be more than once in the original input file. But they will most likely not appear again in consecutive lines.
As far as I understand you code, it looks only for "texta" and ignores "textb". Is that correct?

Thanks a lot!!!

tempestas

Hi ctsgnb,

thanks a lot for your effort!
But at which point can I insert the condition that the next line has to start with textb? Sorry for that question, but your code seems so complex to me that I don't know how to adapt it.
Does it work with something like:

BEGIN{i=1;flag=0}
NR==FNR{p[NR]=$1;a[NR]=$0;next}
/^texta/{flag=1}
if(/^textb/&&flag==1)
/^texta/{x=$2}
x&&/^textins/{sub(".*"$3,$3);y=$0}!y{next}
{do {
        if(p!=x){
                print a
        }else{
                split(a,b)
                print p" \t "y"\t"b[4]" \t "b[5]
                x=y=z;delete b;i++;next
        }
        delete a;i++
}while(length(a))
}END{do{print a;i++;flag=0}while(length(a))}

Thanks a lot for you patience!

tempestas

i have to do the below tasl using script.

please help

Disable the "EXPN" and "VRFY" commands in your current version of the "sendmail" command. A
malicious user able to connect to a machine running sendmail may be able to acquire information
about user accounts on that system

Hi ctsgnb,

I just tried your code on the entire input file: it doesn't work. For some reason it only works if I use the short example lines that you used. As soon as I add some more lines in the input and the insert files the the output file equals the input file. Nothing is replaced or added. How is that possible?

Thanks!

tempestas

Could you please :

1) post a short example (initial input, replacement file, and expected output) that demonstrate when and how should textb considered.

2) upload your big files, input and replacement files.

Thx

give a try to this :

# cat ext.awk
BEGIN{
FS="\t"
}
$3~/Response/{
if (!v&&!r) { if ($4==1) {r++;R[0]=$5} }
else {if ($4==2||$4==3) {K[(v":"r)]=$4;R[(v":"r)]=$5;r++}}
;next}
$3~/Video/{S[++s]=$1;r=1;v++;V[v]=$4;St[v]=$5;next}
$1~/Document/{ Et[++e]=$7-R[0];next}
END {
k=1;do{
t=1;do{x=k":"t
y=(y?y OFS:z) (K[x]?K[x]:0) OFS (R[x]?R[x]-St[k]:0)
} while ((k":"(++t)) in K)
print S[k],V[k],St[k]-R[0],Et[k],y;y=z} while (k++<v)
}

and then run

awk -f ext.awk OFS="\t\t" OrigFile.log

or chose any OFS (Output Field Separator) that fit with your expectations instead of the double tabulation proposed aboved.