Txt to csv convert

john_prince · September 14, 2009, 2:04pm

Hi,
I was trying some split command to pull out values like "uid=abc,ou=INTERNAL,ou=PEOPLE" into a csv file. However because of erratic nature of occurrance of rows made me stopped. Could someone help me in this? and if someone has a one liner for this?

The text file contain pattern like this :

dn: uid=abc,ou=INTERNAL,ou=PEOPLE
uid: abc
epsNotesid: CN=Ai 
 
dn: uid=xyz,ou=Internal,ou=Disabled
uid: xyz
 
dn: uid=ade,ou=Internal,ou=Disabled
uid: ade
 
dn: uid=mng,ou=INTERNAL,ou=PEOPLE
uid: mng
epsNotesid: CN=Ri

Thanks, Prince

vidyadhar85 · September 14, 2009, 2:28pm

you mean like this??

/home/>sed -n '/,*,*,/p' filename
dn: uid=abc,ou=INTERNAL,ou=PEOPLE
dn: uid=xyz,ou=Internal,ou=Disabled
dn: uid=ade,ou=Internal,ou=Disabled
dn: uid=mng,ou=INTERNAL,ou=PEOPLE

john_prince · September 14, 2009, 2:35pm

Thanks for prompt reply.

Well let me roll over to simply a bit :

Txt File : ( Note Each pattern is separated by a blank line )

dn: uid=abc,ou=INTERNAL,ou=PEOPLE
uid: abc
epsNotesid: CN=Ai 
 
dn: uid=xyz,ou=Internal,ou=Disabled
uid: xyz
 
dn: uid=ade,ou=Internal,ou=Disabled
uid: ade
 
dn: uid=mng,ou=INTERNAL,ou=PEOPLE
uid: mng
epsNotesid: CN=Ri

Desired CSV File :

"uid=abc,ou=INTERNAL,ou=PEOPLE","abc","CN=Ai"
"uid=xyz,ou=Internal,ou=Disabled","xyz" 
"uid=ade,ou=Internal,ou=Disabled","ade" 
"uid=mng,ou=INTERNAL,ou=PEOPLE","mng","CN=Ri"

Hope this makes the things more clear.

Thanks, Prince

vgersh99 · September 14, 2009, 2:50pm

nawk -f john.awk myFile

john.awk:

BEGIN {
  RS=FS=""
  OFS=","
  qq=sprintf("%c", 034)
}
{
  for(i=1;i<=NF;i++)
     printf("%c%s%c%c", qq, substr($i, index($i," ")+1), qq, (i==NF)?ORS:OFS)
}

john_prince · September 14, 2009, 3:05pm

Thanks a Lot.

However, the output comes in a single line, something like this :

bash-2.05$ nawk -f john.awk abc.txt
"uid=abc,ou=INTERNAL,ou=PEOPLE","abc","CN=Ai ","","uid=xyz,ou=Internal,ou=Disabled","xyz","","uid=ade,ou=Internal,ou=Disabled","ade","","uid=mng,ou=INTERNAL,ou=PEOPLE","mng","CN=Ri"

My expectations is in multiple lines.

Thanks, Prince

vgersh99 · September 14, 2009, 3:07pm

given your sample input, I get:

"uid=abc,ou=INTERNAL,ou=PEOPLE","abc","CN=Ai"
"uid=xyz,ou=Internal,ou=Disabled","xyz"
"uid=ade,ou=Internal,ou=Disabled","ade"
"uid=mng,ou=INTERNAL,ou=PEOPLE","mng","CN=Ri"

what OS are you on? Works fine on Solaris.
you can try this:

BEGIN {
  RS=FS=""
  OFS=","
  ORS="\n"
  qq=sprintf("%c", 034)
}
{
  for(i=1;i<=NF;i++)
     printf("%c%s%c%c", qq, substr($i, index($i," ")+1), qq, (i==NF)?ORS:OFS)
}

john_prince · September 14, 2009, 3:29pm

my OS is Solaris.

Again the same results, just wondering if newline is getting quoted leading the output to single line.

Thanks

vgersh99 · September 14, 2009, 3:42pm

given myFile:

dn: uid=abc,ou=INTERNAL,ou=PEOPLE
uid: abc
epsNotesid: CN=Ai

dn: uid=xyz,ou=Internal,ou=Disabled
uid: xyz

dn: uid=ade,ou=Internal,ou=Disabled
uid: ade

dn: uid=mng,ou=INTERNAL,ou=PEOPLE
uid: mng
epsNotesid: CN=Ri

I get:

"uid=abc,ou=INTERNAL,ou=PEOPLE","abc","CN=Ai "
"uid=xyz,ou=Internal,ou=Disabled","xyz"
"uid=ade,ou=Internal,ou=Disabled","ade"
"uid=mng,ou=INTERNAL,ou=PEOPLE","mng","CN=Ri"

Make sure that your 'blank' lines are really blank and contain nothing, but a CR.
It looks like your 'blank' lines contain <space> followed by a CR.
Please post the output of 'cat -vet myFile' using code tags.

ripat · September 14, 2009, 3:47pm

Another approach:

 awk -F" |\n" -v RS="" '{for (i=2;i<=NF;i+=2) printf "%s\042%s\042",i==2?"":",",$i;print ""}' file

john_prince · September 14, 2009, 3:47pm

Here it the output :

bash-2.05$ cat -vet abc.ldif

dn: uid=abc,ou=INTERNAL,ou=PEOPLE$
uid: abc$
epsNotesid: CN=Ai $
 $
dn: uid=xyz,ou=Internal,ou=Disabled$
uid: xyz$
 $
dn: uid=ade,ou=Internal,ou=Disabled$
uid: ade$
 $
dn: uid=mng,ou=INTERNAL,ou=PEOPLE$
uid: mng$
epsNotesid: CN=Ri$

vgersh99 · September 14, 2009, 4:20pm

john_prince:

Here it the output :

bash-2.05$ cat -vet abc.ldif

dn: uid=abc,ou=INTERNAL,ou=PEOPLE$
uid: abc$
epsNotesid: CN=Ai $
 $
dn: uid=xyz,ou=Internal,ou=Disabled$
uid: xyz$
 $
dn: uid=ade,ou=Internal,ou=Disabled$
uid: ade$
 $
dn: uid=mng,ou=INTERNAL,ou=PEOPLE$
uid: mng$
epsNotesid: CN=Ri$

This tells me that your 'blank' lines are actully <space> followed by a CR.

john_prince · September 14, 2009, 4:21pm

It gives me following errors, i tried with awk and nawk:

bash-2.05$ awk -F" |\n" -v RS="" '{for (i=2;i<=NF;i+=2) printf "%s\042%s\042",i==2?"":",",$i;print ""}' abc.ldif
awk: syntax error near line 1
awk: bailing out near line 1
bash-2.05$ nawk -F" |\n" -v RS="" '{for (i=2;i<=NF;i+=2) printf "%s\042%s\042",i==2?"":",",$i;print ""}' abc.ldif
nawk: syntax error at source line 1
context is
{for (i=2;i<=NF;i+=2) printf >>> "%s\042%s\042",i== <<<
nawk: illegal statement at source line 1

durden_tyler · September 14, 2009, 6:08pm

One way to do it in Perl:

$ 
$ cat f1
dn: uid=abc,ou=INTERNAL,ou=PEOPLE
uid: abc
epsNotesid: CN=Ai 
 
dn: uid=xyz,ou=Internal,ou=Disabled
uid: xyz
 
dn: uid=ade,ou=Internal,ou=Disabled
uid: ade
 
dn: uid=mng,ou=INTERNAL,ou=PEOPLE
uid: mng
epsNotesid: CN=Ri
$ 
$ 
$ perl -lne 'if(/^dn: (.*)/){printf("\"%s\"",$1)} elsif(/: (.*)/){printf(",\"%s\"",$1)} else{print}END{print}' f1
"uid=abc,ou=INTERNAL,ou=PEOPLE","abc","CN=Ai " 
"uid=xyz,ou=Internal,ou=Disabled","xyz" 
"uid=ade,ou=Internal,ou=Disabled","ade" 
"uid=mng,ou=INTERNAL,ou=PEOPLE","mng","CN=Ri"
$ 
$

tyler_durden

vgersh99 · September 14, 2009, 6:11pm

if you have 'empty lines' (containing '<space>CR'), you can try this:

sed 's/^ *$//' myFile | nawk -f john.awk

john_prince · September 15, 2009, 11:42am

Thanks a Lot. This works.