LDAP data in CSV format

tfm217 · May 3, 2016, 9:07pm

Hi all,

I am new here, please don't eat me alive..

I am trying to find a good community to learn and participate in unix / linux discussions to help me improve in my current job.

That being said, I have a problem which I didn't expect to challenge me, but I can't seem to find a viable solution..

I am trying to do one of two things:

Take an LDIF export from an LDAP directory and convert it to a CSV format where the first line contains attributes and the preceding lines contain values, one line per object.
Find a way to export directly from the LDAP directory into a CSV format in the above configuration.

I am not good at writing scripts, so I've searched around for a pre-existing solution. Perhaps a program which takes an LDIF and a few attribute names as an input and outputs a CSV. Or maybe something which can search an LDAP directory and export the contents directly into CSV format.

Can this be done?

This is Redhat Linux, the process is IBM Security Directory Server which is an LDAP directory.

RudiC · May 4, 2016, 3:10am

As much as I'd like to help, not knowing how "an LDIF export from an LDAP directory" looks like stops me. Posting input and desired output samples will definitely help.

tfm217 · May 4, 2016, 8:55am

Sure, I can do that. Below is the current format of the LDIF output file. It can be modified in terms of the attributes but overall it's the same as the below.

Each object is separated by an empty line. The first line is the full DN of the object and each preceding line is in the form of "attribute = value". Here is an example of how two user objects would be represented. The LDIF only contains user objects.

As you can see, not every attribute is represented, for example, user2 has no 'employeenumber' and so the attribute is not listed.

I am trying to find a way to convert automatically into standard CSV format.

uid=user1,cn=users,dc=domain,dc=com
givenname=User
salesorg=2010
sn=One
telephonenumber=+15555555555
l=Melville
employeetype=E
mail=user1@domain.com 
uid=user1
c=United States
postalcode=11747
cn=User One
preferredlanguage=en_US
employeenumber=01000527
st=NY

uid=user2,cn=users,dc=domain,dc=com
givenname=User
salesorg=2010
sn=Two
employeetype=R
mail=user2@domain.com
uid=user2
c=United States
preferredlanguage=en_US
cn=User Two

RudiC · May 4, 2016, 9:28am

And now: the desired output sample. (What be a "standard CSV format"?)

tfm217 · May 5, 2016, 9:24am

Here's a description and an example:

The first line should contain all possible attributes (comma separated), the preceding lines containing the values, one continuous comma-separated line for each user object. Any values already containing commas would need to be surrounded with quotes. For example:

 uid,givenname,salesorg,sn,telephonenumber,l,employeetype,mail,uid,c,postalcode,cn,preferredlanguage,employeenumber,st
 "user1,cn=users,dc=domain,dc=com",User,2010,One,+155555555555,Melville,E,user1@domain.com,user1,United States,11747,User One,en_US,01000527,NY

Aia · May 5, 2016, 10:52pm

tfm217:

Here's a description and an example:

The first line should contain all possible attributes (comma separated), the preceding lines containing the values, one continuous comma-separated line for each user object. Any values already containing commas would need to be surrounded with quotes. For example:
 uid,givenname,salesorg,sn,telephonenumber,l,employeetype,mail,uid,c,postalcode,cn,preferredlanguage,employeenumber,st
 "user1,cn=users,dc=domain,dc=com",User,2010,One,+155555555555,Melville,E,user1@domain.com,user1,United States,11747,User One,en_US,01000527,NY

Save as ldap2csv.pl
Run as perl ldap2csv.pl ldapfile

#!/usr/bin/env perl
use strict;
use warnings;

# Bail out if no file is given as an argument.
my $ldap_file = shift || usage();

# To save output with extension .csv.
my $csv_file = "$ldap_file.csv";

# Preserve order of header labels.
my @header = qw(uid0 givenname salesorg sn telephonenumber l employeetype mail
                    uid c postalcode cn preferredlanguage employeenumber st);

# Boiler plate to open input and output files handles.
open my $in, '<', $ldap_file or die;
open my $out, '>', $csv_file or die;

# Write the header labels.
print  $out "uid,", join (',', @header[1..$#header]), "\n";

# Work engine.
{
    # Record separator is a paragraph representation.
    local $/="\n\n";

    # Process one record at a time.
    while(my $record = <$in>) {
        chomp $record; # Remove the record separator.
        my %data;
        @data{@header} = ();  # To label the data.
        my @entries = split '\n', $record; # Create data entries.

        # Save first entry uid or it will be over written by second instance.
        $data{'uid0'} = qq{"$entries[0]"};

        # Work with each entry.
        for my $entry (@entries){
            my ($key, $value) = split "=", $entry, 2; # split only by first equal.
            $value =~ s/\s+$//; # Clean trailing spaces.
            $value = qq{"$value"} if $value =~ /,/; # Surround with double quotes any entries with comas.
            $data{$key} = $value;
        }
        # Separate entries with comas.
        my $row = join ',', (map{ $data{$_}?$data{$_}:""} @header);

        # Write to output file.
        print $out "$row\n";
    }
}
# Dismiss file handles.
close $in;
close $out;

# Feed back to user.
print "$csv_file has been saved in the current directory\n";

sub usage {
    print "Usage: $0 ldapfilename\n";
    exit 1;
}

tfm217 · May 6, 2016, 12:41pm

Aia, very awesome, thanks so much. This works 100% with what I am trying to do.

I appreciate you putting this together for me.

tfm217 · May 9, 2016, 2:32pm

If you are up for another challenge... I am also separately trying to pull some more data on "group" objects.

Basically every group is an object "cn=Groupname" and the attributes are "cn" and "uniquemember". The "uniquemember" attribute is repeated as many times as there are users in the group.

Ex:

 cn=Director,cn=groups,DC=MARCHON,DC=COM
cn=Director
uniquemember=uid=skressman,cn=users,DC=MARCHON,DC=COM
uniquemember=uid=rf1,cn=users,DC=MARCHON,DC=COM
uniquemember=uid=dingino,cn=users,dc=marchon,dc=com
uniquemember=uid=lgittler,cn=users,dc=marchon,dc=com
uniquemember=uid=junger,cn=users,dc=marchon,dc=com

There can be any number of uniquemember's and there is no way to tell when the last one is being read, unless you consider the next thing read in will be a new object (cn=...)

Ideally the output would be cn,cn,uniquemember,uniquemember,uniquemember,...(multiplied by the current # of entries)

Don_Cragun · May 9, 2016, 4:23pm

Moderator comments were removed during original forum migration.