Awk, find and print

Ubuntu, Bash 4.3.48

Hi,

I have this input file with many columns separated with ":"

ARC=121:ERF=12244:IDE=2334:ADA=34 ....
ERF=124:ARC=123:IDE=2344:ADA=54 ....
ERF=16254:IDE=2434:ADA=78:ARC=134 ....

and I want this:

ARC=121:IDE=2334
ARC=123:IDE=2344
ARC=134:IDE=2434

I need to use awk because with this code " grep -oh "\w*ARC=\w*" " or similar it isn't flexible.. many thanks!

echo manolis

Hello echo manolis,

Could you please try following and let me know if this helps you.

awk '{match($0,/ARC=[0-9]+/);val1=substr($0,RSTART,RLENGTH);match($0,/IDE=[0-9]+/);print val1":"substr($0,RSTART,RLENGTH)}'  Input_file

Output will be as follows.

ARC=121:IDE=2334
ARC=123:IDE=2344
ARC=134:IDE=2434

Thanks,
R. Singh

1 Like

ok, many thanks! It is perfect! One more question...

Input

ARC=3.215465:ERF=12244:IDE=122/43:ADA=34 ....
ERF=1235:ARC=5.244:IDE=14/4:ADA=54 ....
ERF=568:IDE=2/43:ADA=78:ARC=1.6254 ....

Output

ARC=3.2:IDE=122/43
ARC=5.2:IDE=14/4
ARC=1.6:IDE=2/43

Of course I have to use the same code, just to add for the val1= (ARC) the simbo %.1f ... somewhere ... and for the val2= (IDE) the tags [0-9]\/[0-9] ???

Could you help me again please!

Starting code:

awk '{match($0,/ARC=[0-9]+/);val1=substr($0,RSTART,RLENGTH);match($0,/IDE=[0-9]+/);val2=substr($0,RSTART,RLENGTH);print val1":"val2}' Input_file

Are you just searching on-line for solutions to your problems? Or, have you read the awk man page on your system and tried using it to construct code that meets your requirements?

Why would we use the above Starting code which is looking for strings that contain GT= and AF= followed by one or more decimal digits and has mismatched double-quotes in the print statement instead of starting with the code Ravinder suggested in post #2 that will give you output close to what you want instead of reporting a syntax error and printing nothing?

Working from Ravinder's code, extending the match() extended regular expressions to accept periods or slashes in addition to decimal digits and using sprintf() to round the numbers after ARC= to 1 decimal place, one might want to try something more like:

awk '
{	match($0, /ARC=[0-9.]+/)
	val1 = sprintf("ARC=%.1f", substr($0, RSTART + 4, RLENGTH - 4))
	match($0, "IDE=[0-9/]+")
	print val1":"substr($0, RSTART, RLENGTH)
}'  Input_file

Hi Don Cragun,

I made some confusion between examples and codes. I corrected my last post. However thank you very much!

Best,
echo manolis

Hello echo manolis,

Could you please try following and let me know if this helps you.

awk '{match($0,/ARC=[0-9]+\.[0-9]+|ARC=[0-9]+/);val1=substr($0,RSTART,RLENGTH);match($0,/IDE=[0-9]+\/[0-9]+|IDE=[0-9]+/);print val1 ":" substr($0,RSTART,RLENGTH)}'   Input_file

Output will be as follows.

ARC=3.215465:IDE=122/43
ARC=5.244:IDE=14/4
ARC=1.6254:IDE=2/43

Thanks,
R. Singh