Sorting group information for accounts

I have an input file that contains the primary and secondary groups a user should have based on a pre-defined role. The input file looks like this:

<user_login>|<comment_field>|<role>

After I execute my script to do some grepping I have the following user file where the secondary groups are defined by the role:

<user_login>|<dir_path_set_by_role>|<shell_set_by_role>|<primary_GID>|<secondary_GID>|<secondary_GID>|<secondary_GID>|...

The problem I have is that GID's must be resolved to their group name and then have the corresponding users added to an upload file that would look like your typical group entry:

<group_name>:x:<GID>:<user_1>,<user_2>,<user_3>,...

I need to sort and uniq the GID file, grep the GID in the output file to see which users require that GID, and then append them to the group upload entry. I don't know if this is necessary but I put I'm using a few temp files to store output and I'm at the part where I have all the sorted and uniq'd groups in a temp file. What I need to do is grep the GID through the user file and then append that user to the upload file where each match of the group would replace user_1, respectively. So, if my user file looked like this:

jon|/home/managers|/bin/ksh|1573|1893|4907|6135|
mike|/home/managers|/bin/ksh|1573|4907|1530||
marsha|/home/employees|/bin/ksh|739|1893||

my upload file would look like this (the GID's are translated to group names):

managers:x:1573:jon,mike
folder_1:x:1893:jon,marsha
folder_2:x:4907:jon,mike
folder_3:x:6135:jon
folder_4:x:1530:mike
employee_folder_1:x:739:marsha
employee_folder_2:x:1839:marsha,jon

Can you help me setting up the group upload file?

how are the GID's translated? does the output file already exist and need to be amended? or is it a separate input file. is it the same format without any users then?

For the GID translation I simply "getent group | cut -d: -f1,3 > groups" and then I grep the GID in the groups file. It has to be this way due to circumstances beyond my control.

The output file could possibly be amended with the entries of the group upload file (sorry for the "file" terminologies). I don't see any reason why that would be a problem but if it was I could easily find a delimiter and cut them back into separate files. I think it would be ideal for them to be separate files because the system accepting group modifications in bulk will only accept input in the manner I have specified (group:x:gid:user1,user2,user3,<etc>). It would be the same format without any users however I don't believe any grep of the GID's in the user file would not return a result. But it's ok if there's an entry that's just: group:x:gid:

#!/usr/bin/awk -f

#firstly bring in this systems group entries
BEGIN {
        FS=OFS=":"
        cmd="getent group"
        while ((cmd | getline) > 0) {
                g_idx[g_idx_max++]=$3   #preserve order
                #create new line,excluding any users-- group:x:gid:
                group[$3]=$1 OFS $2 OFS $3 OFS
                g_cnt[$3]=0
        }
        FS="|"
}

{
        for (i=4;i<=NF;i++) {
                if (!length($i)) continue
                if (!($i in group)) {
                        printf("%s to be added to gid %d, but it doesn't exist!\n", $1, $i) > "/dev/stderr"
                        continue
                }
                group[$i]=group[$i] (g_cnt[$i]++ ? "," : "") $1
        }
}

#all calculated up, print
END {
        for (i=0;i<g_idx_max;i++)
                if (g_cnt[g_idx] > 0) #only print ones we added to
                        print group[g_idx]
}
1 Like

Thanks, scott. I'm still examining this before I run it. I'm confused on where it is accepting input. I think it is the while ((cmd | getline) but I'm not familiar with getline and "getent group" will simply list all groups in the system, not the specific ones I'm determining. Still reading through it but wanted to give my thanks in advance.

ok it is a lot. if you put in separate file and make executable, you run ./script users > upload-file . or you can put all of it in single quotes and use it inside of existing shell script:

awk '
BEGIN{...}
...code...
}' "$users_file" > "$upload_file"

so it gets the user file as parameter. it first runs getent itself, once, instead of many times and grepping it etc. in the end it will only print groups which it added users to.

awk is quick at text manipulations. probably the rest of the script would incorporate well into it.

So far this works for me - still testing. I'm not familiar w/ using arrays or built-in awk variables but I've been googling and will eventually practise to become more knowledgeable. I've made a few tweaks:

The secondary groups actually start in field 11, so after a bit of tinkering I changed "i" to 11 instead of 4. For some reason the output is always preceded by a "it doesn't exist!" error but the user is also properly included in the respective groups output by the script.

The groups and respective GID's have slight conflicts per server (maybe 5% of the GID's conflict across all servers) so what i typically do is pull them all into 1 big file then sort and uniq them. So, instead of "getent" I replaced that with "cat <group_file>" and it seems to work. I just need to make sure the tool that accepts the group upload file will notify of any conflicts and not wipe out anything which I will do.

I still have a lot to learn about this script but does that sound right - replacing i with 11 instead of 4? Everything appears to check out so far.

Edit: Darn - I also have to sort GID upload files by server so I upload them to the proper one :frowning: It just gets better and better!

Sounds like a user management nightmare. Maybe they should use some sort of central authentication server.

The modification you described for changing 4 to 11, and getent to cat group are both correct. But in the case of cat group, you could just use while ((getline < "group_file") > 0)

the "it doesn't exist error" is because a user had a GID that the group_file doesn't... the ones it has will still be correct.

Just a quick note - it is a central management system called KeON but it's split into security domains: development, testing, production, and I dunno how to describe the fourth one. All the development servers communicate within the development domain and so on. Within each domain they have their own ldap implementations, but it took us a while to develop a central GID reservation database. Until that happened some groups were implemented with different GID's then the same group in a different domain. It's a nightmare fo sho :slight_smile:

Btw I've been googling this "getline" business. I don't understand why it's any different than redirection? I found this little sample kludge for an exercise on a tutorial site I found:

#!/usr/bin/awk -f

BEGIN {FS="[ :\t]"; "date" | getline d; print "The Current Date is: "d;
print "#################################################"}

How is it different than this:

#!/bin/bash

d=`date`
echo "The Current Date is: "$d
echo "#################################################"

I realise you can accomplish things in several different ways and methods but what is the use of getline when all you need to is direct standard output?

In this specific case, the difference is that for the first nine days of each month, the awk form preserves the two spaces between <abbreviated month name> and <day of month>. That difference would be removed if you replace:

echo "The Current Date is: "$d

with:

echo "The Current Date is: $d"

But, presumably, you're writing an awk script to do a lot more than just print these two header lines. Why do you want to go through the overhead of starting up a shell to print the header lines with bash and do the rest of your processing with awk when awk can do it just as easily?

Well, I just wasn't sure what the purpose of the "getline" function was compared to the use of redirection. I'm still googling and re-reading the man page at the moment.