remove chunks of text from file

All,

So, I have an ldif file that contains about 6500 users worth of data. Some users have a block of text I'd like to remove, while some don't.

Example (block of text in question is the block starting with "authAuthority: ;Kerberosv5"):

User with text block:

# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUW
 VBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaH
 R0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJ
 zaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4K
 CTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35
  1423486873699801821345071757674738484067280188359389504392445041998105914670
 84867869429532763785664902803450035110236201552277202539905523086333992178101
 54867353409493808376385021788117196022631658234104675864712197939496802664455
 87225827331332464303631278838001920713257416459820742251056515142078124405645
 79 root@example.com:123.456.789.111
authAuthority: ;Kerberosv5;0x470bb9eb325f31c3000040ee00002257;username@EXAMPLE.C
 OM;EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728
 01883593895043924450419981059146708486786942953276378566490280345003511023620
 15522772025399055230863339921781015486735340949380837638502178811719602263165
 82341046758647121979394968026644558722582733133246430363127883800192071325741
 645982074225105651514207812440564579 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZ
 XJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Now, one problem is, ldapsearch/ldapdump break up attributes at 76 characters. So, the block in question should be one line.

So I'm curious if there's an easy way to either A. remove the line breaks for the blocks of text (any line that starts with a " " should have the space removed, and should be on the line above. Though, one line starts with " " and only should have one " " removed then get put back with the previous line, or B. just to nuke the whole block of text that starts with "authAuthority: ;Kerberosv5" and ends with "example.com:123.456.789.111".

Anyone have any ideas? (btw, I realize that the line breaks aren't at exactly 76 anymore, since I had to sterilize the text for any personal info).

can you post the input and the desired output as well? this way, you'll get response quickly

So, the output would look like this...

Don't much care how we get there.

# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUW
 VBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaH
 R0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJ
 zaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4K
 CTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35
  1423486873699801821345071757674738484067280188359389504392445041998105914670
 84867869429532763785664902803450035110236201552277202539905523086333992178101
 54867353409493808376385021788117196022631658234104675864712197939496802664455
 87225827331332464303631278838001920713257416459820742251056515142078124405645
 79 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZ
 XJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Notice the "authAuthority: ;Kerberosv5" section is gone.

Thanks!

this will remove the one " " then get put back with the previous line..

sed -ne 'H
${
x
s/\n //g
p
}' filename

this will remove that chunk..

sed '/authAuthority: ;Kerberosv5/,/ root@example.com/d' filename

Wow, great.

So, the first one to remove the line breaks works great. The second one, which removes the chunk, doesn't seem to work. The resulting output is significantly truncated (it seems to be removing a very large portion of the file).

Before the second command, the file is 181095 lines. After the sed command, it's 2652. Since there are 6411 users in the ldif file, theoretically, that sed command should only be removing one line per user, so the output should be about 175k lines.

sed should stop it's pattern match at the first "stop" it sees. Could that not be working?

For example...

Here's the output after taking my "input" and running it through the "de-space/de-return sed"

# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaHR0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJzaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4KCTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111
authAuthority: ;Kerberosv5;0x470bb9eb325f31c3000040ee00002257;username@EXAMPLE.COM;EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111
userPassword:: KioqKioqKio=
uid: username
cn: Firstname Lastname
gidNumber: 1029
givenName: Firstname
sn: Lastname
apple-user-homeurl:: PGhvbWVfZGlyPjx1cmw+YWZwOi8vamRhdGExLnVvcmVnb24uZWR1L1VzZXJzPC91cmw+PHBhdGg+c3R1cmNvPC9wYXRoPjwvaG9tZV9kaXI+
homeDirectory: /Network/Servers/example.com/Users/username
apple-user-homequota: 4294967296
mail: username@example.com

Now here's what comes out after the sed to remove the kerberos block (`sed '/authAuthority: ;Kerberosv5/,/ root@example.com/d' test2.ldif`):

# username, users, example.com
dn: uid=username,cn=users,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: apple-user
objectClass: extensibleObject
objectClass: organizationalPerson
objectClass: top
objectClass: person
apple-generateduid: 53CA02D7-B116-4461-B220-E3FC0B15964A
apple-mcxflags:: PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIHBsaXN0IFBVQkxJQyAiLS8vQXBwbGUgQ29tcHV0ZXIvL0RURCBQTElTVCAxLjAvL0VOIiAiaHR0cDovL3d3dy5hcHBsZS5jb20vRFREcy9Qcm9wZXJ0eUxpc3QtMS4wLmR0ZCI+CjxwbGlzdCB2ZXJzaW9uPSIxLjAiPgo8ZGljdD4KCTxrZXk+c2ltdWx0YW5lb3VzX2xvZ2luX2VuYWJsZWQ8L2tleT4KCTx0cnVlLz4KPC9kaWN0Pgo8L3BsaXN0Pgo=
loginShell: /bin/bash
uidNumber: 20192
authAuthority: ;ApplePasswordServer;0x470bb9eb325f31c3000040ee00002257,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@example.com:123.456.789.111

So obviously, something is not quite right...

Thanks!

---------- Post updated 07-30-09 at 10:59 AM ---------- Previous update was 07-29-09 at 07:43 PM ----------

ah ha... okay, so after removing line breaks, the command should be:

sed '/authAuthority: ;Kerberosv5/d' filename

Obviously (now) because it's all one line, and sed stops at the first line break after the match.

So, now that that's working, here's another one.

Assuming I have "username", and "REALM", can anyone think of a good way to turn this:

 ;ApplePasswordServer;0x49e8c2c668dbcb0200004a090000342a,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@ldap.example.com:123.456.789.111

Into this:

 ;Kerberosv5;0x49e8c2c668dbcb0200004a090000342a;username@REALM.EXAMPLE.COM;REALM.EXAMPLE.COM;1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@ldap.example.com:123.456.789.111

You'll notice that the hex value following the ;ApplePasswordServer; and ;Kerberosv5; are the same, as are the sections beginning with 1024 and ending with root@ldap.example.com:123.456.789.111. So basically, need so swap out ApplePasswordServer with Kerberosv5, and add the username and REALM info.

I can think of a few ways to do this with php, but I'm not particularly good with regex, and it would be nice to do this all in (ba)sh.

Thanks!

Instead of us doing your work for you, why not give it a shot yourself, run it, debug it, and come to us with specific questions?

Regards

okay, whatever.

figured I might learn some regex in the process, but I'll just do it the PHP way and be done with it.

Thanks, I suppose.

---------- Post updated at 12:10 PM ---------- Previous update was at 11:40 AM ----------

How about this then...

Can anyone think of a more streamlined version of this (mainly the multiple pipes to sed):

#!/bin/sh

INPUT=';ApplePasswordServer;0x49e8c2c668dbcb0200004a090000342a,1024 35 142348687369980182134507175767473848406728018835938950439244504199810591467084867869429532763785664902803450035110236201552277202539905523086333992178101548673534094938083763850217881171960226316582341046758647121979394968026644558722582733133246430363127883800192071325741645982074225105651514207812440564579 root@ldap.example.com:123.456.789.111'

OUT1=`echo "$INPUT" | sed 's/ApplePasswordServer/Kerberosv5/' | sed 's/\,/;username@REALM.EXAMPLE.COM;REALM.EXAMPLE.COM;/'`

echo $OUT1

Otherwise, this'll work.

Thanks.