Help With String Manipulation

Samingla · August 9, 2011, 3:03pm

Hi Guru's,
I need some help with data manipulation using shell scripting. I know how to replace the whole string but not part of the string.
The value after aa= should be replaced with the value in the mail leaving ,OU=111,OU=222,DC=333 as is. Below are the inputs and expected outputs.

Input:
dn: aa=XYZ,OU=111,OU=222,DC=333
mail: xyz@123.com
uid: xyz@123.com
sdf: aaa
 
dn: aa=abc,OU=111,OU=222,DC=333
mail: npr@123.com
uid: npr@123.com
sdf: www
 
dn: aa=def,OU=111,OU=222,DC=333
mail: def@123.com
uid: def@123.com
sdf: eee
 
dn: aa=ram,OU=111,OU=222,DC=333
mail: med@123.com
uid: med@123.com
sdf: qqq

output:
dn: aa=xyz@123.com,OU=111,OU=222,DC=333
mail: xyz@123.com
uid: xyz@123.com
sdf: aaa
 
dn: aa=npr@123.com,OU=111,OU=222,DC=333
mail: npr@123.com
uid: npr@123.com
sdf: www
 
dn: aa=def@123.com,OU=111,OU=222,DC=333
mail: def@123.com
uid: def@123.com
sdf: eee
 
dn: aa=med@123.com,OU=111,OU=222,DC=333
mail: med@123.com
uid: med@123.com
sdf: qqq

Thanks in advance,
Sam

Scrutinizer · August 9, 2011, 3:29pm

Hi, see if this works for you :

awk '{sub(/aa=[^,]*/,"aa="$4)}1' RS= ORS='\n\n' infile

(on Solaris use /usr/xpg4/bin/awk)

DGPickett · August 9, 2011, 3:34pm

You can read each block into the buffer in sed and move strings around:

sed '
  :loop
  $b sub
  /\n$/b sub
  N
  b loop
  :sub
  s/^dn: aa=[^@]*\(@.*\nmail: \([^@]*\)@\)/dn:aa=\2\1/
 ' infile >outfile

Samingla · August 9, 2011, 4:28pm

Hi scrutinizer,

Thanks for the quick response. It is working fine for the sample input. My input file block does not have the same sequence all the time. mail: parameter can be any were in the block. I want the value in mail: parameter to be copied.

Thanks,
Sam

DGPickett · August 9, 2011, 4:51pm

I edited above to make it more specific.

Samingla · August 9, 2011, 8:15pm

Hi DGPickett,
It is not wrorking for all the senarios. The code is working only if the aa value is in email format. I want the value from mail to be substituted after aa="" and leaving OU=111,OU=222,DC=333 as is. mail parameter can be any where in the section.Below are all the inout senarios that I have. I am adding the expected output.

Input:
dn: aa=XYZ,OU=111,OU=222,DC=333
uid: xyz@123.com
sdf: aaa
mail: xyz@123.com
 
dn: aa=abc,OU=111,OU=222,DC=333
uid: npr@123.com
mail: npr@123.com
sdf: www
 
dn: aa=qqq@231.com,OU=111,OU=222,DC=333
mail: def@123.com
uid: def@123.com
sdf: eee
 
dn: aa=ram,OU=111,OU=222,DC=333
mail: med@123.com
uid: med@123.com
sdf: qqq

output:
dn: aa=xyz@123.com,OU=111,OU=222,DC=333
uid: xyz@123.com
sdf: aaa
mail: xyz@123.com
 
dn: aa=npr@123.com,OU=111,OU=222,DC=333
uid: npr@123.com
mail: npr@123.com
sdf: www
 
dn: aa=def@123.com,OU=111,OU=222,DC=333
mail: def@123.com
uid: def@123.com
sdf: eee
 
dn: aa=med@123.com,OU=111,OU=222,DC=333
mail: med@123.com
uid: med@123.com
sdf: qqq

Thanks for the help
Sam

---------- Post updated at 07:15 PM ---------- Previous update was at 04:22 PM ----------

Hi Dgpickett,

The below code is partly working but the stiing "OU=111,OU=222,DC=333" after aa=med@123.com is getting deleted for all the entries that the code is processing. How can we retain OU=111,OU=222,DC=333 for all the processed entries.Below is the code.

sed '
  :loop
  $b sub
  /\n$/b sub
  N
  b loop
  :sub
  s/^dn: aa=[^@]*\(@.*: \([^@]*\)@\)/dn: aa=\2\1/
 ' input >output

Below is the output that I am getting

Output of one of the entries
dn: aa=med@123.com
mail: med@123.com
uid: med@123.com
sdf: qqq

Thanks,
Sam

danmero · August 9, 2011, 10:01pm

awk -F'[ =,]' '/^dn:/{x=$3;y=$0;next}/^mail:/{sub(x,$NF,y);$0=y ORS$0}1' file

michaelrozar17 · August 10, 2011, 5:31am

Alternate Sed..

# Below sed requires last line of the file to be an empty line
sed -n 'H;/^  */{x;s/\(.*aa=\)[^,]*\(,.*\n\)\(mail: \)\([^\n]*\)\(\n.*\)\n/\1\4\2\3\4\5/p}' inputfile

# but not this one..
sed -n 'H;/^  */{x;s/\(.*aa=\)[^,]*\(,.*\n\)\(mail: \)\([^\n]*\)\(\n.*\)\n/\1\4\2\3\4\5/p};
${x;s/\(.*aa=\)[^,]*\(,.*\n\)\(mail: \)\([^\n]*\)\(\n.*\)/\1\4\2\3\4\5/p}' inputfile

Scrutinizer · August 10, 2011, 7:03am

This should work, irrespective of the order of labels:

awk '{for(i=1;i<NF;i++)if($i=="mail:")sub(/aa=[^,]*/,"aa="$(i+1))}1' RS= ORS='\n\n' infile

dude2cool · August 10, 2011, 9:04am

Scrut, doesn't look it is returning the desired output

:~$ awk '{for(i=1;i<NF;i++)if($i=="mail:")sub(/aa=[^,]*/,"aa="$(i+1))}1' RS= ORS='\n\n' /tmp/t1

here is the output from above:

dn: aa=med@123.com,OU=111,OU=222,DC=333
uid: xyz@123.com
sdf: aaa
mail: abc@123.com

dn: aa=abc,OU=111,OU=222,DC=333
uid: npr@123.com
mail: npr@123.com
sdf: www

dn: aa=qqq@231.com,OU=111,OU=222,DC=333
mail: def@123.com
uid: def@123.com
sdf: eee

dn: aa=ram,OU=111,OU=222,DC=333
mail: med@123.com
uid: med@123.com
sdf: qqq

Scrutinizer · August 10, 2011, 9:22am

@dude2cool.
That has to do with the fact that the sample posted here for some reason contains a space on the empty lines. Then the RS= construction does not work, since it needs two consecutive carriage returns to separate records. If you test with a sample with real empty lines, then it should work.

DGPickett · August 10, 2011, 9:26am

Yes, invisible requirements are hard to fulfill!

dude2cool · August 10, 2011, 9:32am

@scrutinizer - thanks for the explanation.

@DGPicket - lol,

DGPickett · August 10, 2011, 10:03am

Actually, there are extended regex to deal with possible white space, similar to this:

Regex Tutorial - \b Word Boundaries

One friend says the PERL guys dominated POSIX and expanded regex in a PERL flavored way from traditional meanings, somewhat a departure from most UNIX things being forward compatible. Goodbye '\<' and '\>' !

Scrutinizer · August 10, 2011, 10:11am

Hi DGPickett, \b are not part of POSIX regex, and \< is GNU-only isn't it?

DGPickett · August 10, 2011, 11:24am

I think it varies by command, library, and options. I can get and lose sed \< by library path. It was there originally, but later it disappeared (for no good reason after decades) and \b appeared. GNU means GNU Not UNIX, and I guess that covers this.

alister · August 10, 2011, 11:59am

Because it's not a party until the BSDs arrive, from the {Open,Net,Free}BSD re_format(7) manual page:

Regards,
Alister

DGPickett · August 10, 2011, 12:09pm

I guess even PERL, C++ and JAVA may vary if I can vary sed by library path. Be warned!