Remove space with sed

arm · July 27, 2019, 7:25am

Hello Folks ,
myfile contains 1000000 records as follows:

logver=56 idseq=63256 itime=1111 devid=TG-40 devname=PUI-C2 vd=USER date=2019_01_10 time=18:39:49 logid="000013" type="traffic" subtype="forward" level="notice" eventtime=134 srcip=1.1.1.1 srcport=1 srcintf="XYX-CORE.01" srcintfrole="undefined" dstip=2.2.2.2 dstport=17 dstintf="W-XY.100" dstintfrole="undefined" poluuid="e7a88496-b510-51e7-dbcd-384d6bbc0805" sessionid=24343 proto=17 action="accept" policyid=1 policytype="policy" service="udp/17000" dstcountry="United States" srccountry="Reserved" trandisp="snat" transip=120.1.1.1 transport=23136 duration=90 sentbyte=33 rcvdbyte=36 sentpkt=1 rcvdpkt=1 appcat="unscanned"

cat myfile | sed 's/[a-z] * //g' | awk '{for(i=1;i<=NF;i++){if($i~/dstcountry/) printf "|%s",$i};printf "\n" }' | \
sed 's/^|//g;s/[a-z]*.=//g;s/"//g'[/CODE]

after run above script I got

UniteStates

I need it to be

UnitedStates

I think the problem with first "sed" of mine , need to know how to remove the space between two words in double quote "String1 space String2"

Neo · July 27, 2019, 7:59am

Hello arm!

In case you forgot to read the forum rules, here are two rules you are not following here:

Please follow posting rules.

Thanks

Neo

rdrtx1 · July 29, 2019, 9:32am

awk '
{while (match($0, /[^ ]*="[^"]*  *[^"]*"/)) {
   s=t=substr($0, RSTART, RLENGTH);
   gsub("[ \"]*", "", s);
   sub(t, s);
 }
 for (i=1; i<=NF; i++) if ($i ~ /^dstcountry=/) {sub(".*=", "", $i);  print $i;}
}
' myfile

Don_Cragun · July 29, 2019, 6:41pm

Hi arm,
Please be very careful when posting your code. When I tried to run your script with the data you supplied, the output I got was:

United

(Note that there is a space at the end of that output. The BRE that you supplied to your first sed has two spaces after the asterisk and there aren't any occurrences of two adjacent spaces in your sample input. So I don't see how the code you showed us could have produced the output you showed us.

Assuming that there is no more than one occurrence of that field in an input line, one might also try:

awk '
match($0, /dstcountry="[^"]*"/) {
        $0 = substr($0, RSTART + 12, RLENGTH - 13)
        gsub(" ", "")
        print
}' myfile

which might be a little bit faster since it doesn't have to process dozens of fields that don't match.

Note also that using cat and sed as input and output filters for awk is almost always a waste of system resources that will slow down your script and may also delay other things running on your system while your script is running.