assign colon delimited strings to variables

NewSolarisAdmin · January 27, 2009, 2:32pm

Man it has been too long since I have had to do this type of stuff...

OK I have a file with lines in it looking like this:
bob:johnson:email@email.com (most lines)

john:F.:doe:email2@email.com (but some are like this)

I need to loop through and assign vars to the values:

var Fname = bob
var Lname = johnson
var email = email@email.com

If there is a middle initial I want to toss it and move on and get the last name from the next (3rd) field.

I have tried this:

#!/bin/bash

while read line
do
        Fname=`cut -d: -f1`
        Field2=`cut -d: -f2`
                if [ "$Field2" = "##" ]; then
                        Lname=`cut -d: -f3`
                else
                        Lname=$Field2
                fi

        echo -e  "First Name is :$Fname\n"
        echo -e  "Last Name is :$Lname\n"
done < test

BTW this code is tabbed to be easier to read but this board is taking them out of it for some reason...

Needless to say it isn't working or I would not be posting this....

I get this as an output for example:

First Name is :Linda
Wendy
Esther
Pamela
Richard
William
Leona
Anne
Elaine

Last Name is :

Any help is appreciated.

joeyg · January 27, 2009, 2:43pm

What I am doing is searching for any 2 characters (the .. does this in sed) with the third character a . (need to do \. in sed so it knows I want the period), and deletes those three if they exist.

> echo "john:F.:doe:email2@email.com"
john:F.:doe:email2@email.com
> echo "john:F.:doe:email2@email.com" | sed "s/..\.//g"
john:doe:email2@emacom

Now, your data is clean without the initial.

Note... as I think about this, the following might be better as it seaches for : then any character than period:

> echo "john:F.:doe:email2@email.com" | sed "s/:.\.//g"
john:doe:email2@email.com

NewSolarisAdmin · January 27, 2009, 2:55pm

First again, thank you very much.

So now I will have:

bob:johnson:email1
jan:geters:email2
ricky:sticky:email3

I need to loop through and assign vars to each value in the line so I can do some testing on them.

Man this is getting complicated, I hate asking for all this help to do things I used to be so good at a long time ago, but I do really appreciate the refresher, so thanks a tun!

Big Picture:

I am going to have one file like the above format. Then another simular file with different colon delimited values. I need to pull lets say last name from file 1 and find it in file 2. Then build a file 3 with the values from the first two files but in a different order..

You follow me?

Just thought it may help if you knew the grand scope of what is going on here...

joeyg · January 27, 2009, 2:58pm

Can you post samples of the two originating files?
And, what the data should look like in the final (third) file?

Comment and describe where appropriate.

NewSolarisAdmin · January 27, 2009, 3:01pm

Will do, and thanks again for being willing to help me here.

I think I am on to something so please give me just a minute.

gauravsachan · January 27, 2009, 3:01pm

A slight modification in the sender's own code :

while read line
do
echo $line
Fname=`echo $line|cut -d: -f1`
Field2=`echo $line|cut -d: -f2`
len=`echo $Field2|awk '{print length($0)}'`
if [ $len -eq 2 ]; then
Lname=`echo $line|cut -d: -f3`
else
Lname=$Field2
fi
echo "First Name is =$Fname\n"
echo "Last Name is =$Lname\n"
done < 9.txt

Ikon · January 27, 2009, 3:06pm

/bin/awk -F: '!/^#/ {print $1, $2, $3}' somefile | \
while read Fname Lname email; do
 echo "Fname:$Fname - Lname:$Lname - Email:$email"
done

NewSolarisAdmin · January 27, 2009, 3:24pm

OK file 1 format:
Fname:Lname:ID_number:multiple emails separated by ,'s
EX:
bob:johnson:3343:bjohn@email_SB.org,bjohn@emailARG.org,3343@email.org

File 2 format:
Fname:Lname:just one email
EX:
bob:johnson:bjohn@email_SB.org

And I need file 3 to look like:
ID_Number:Fname:Lname:email from file 2:whatever email ends in @emailARG.org from file 1

SO with this EX it would be:

3343:bob.johnson:bjohn@email_SB.org:bjohn@emailARG.org

I think I have figuread out how to run through and assign vars to the values using:

while read line
do
        #echo $line
        Fname=`echo $line | cut -d: -f1`
        echo "Fname = $Fname"
        Lname=`echo $line | cut -d: -f2`
        echo "Lname = $Lname"
        ID=`echo $line | cut -d: -f3`
        echo "ID = $ID"

done < test

test being the name of file 1 from above.

That is as far as I have gotten...

NewSolarisAdmin · January 27, 2009, 3:25pm

I am looking at the other responses I got while typing that up now, so thanks to all in the mean time...

NewSolarisAdmin · January 27, 2009, 3:26pm

Oh and BTW I got the DB people to toss the middle initials out so that section can be omitted.

NewSolarisAdmin · January 27, 2009, 3:29pm

Oh also the files are not ordered the same, so I have to pull the email from file 2, search for (ie grep) the correct line from file 1 off that, and compile file 3's line based on what I find.

joeyg · January 27, 2009, 3:35pm

How big are the two files?
If reasonable size, then a grep from one file to another might not be too bad.
The better approach would be with awk, and setting up a couple of arrays to help link the data.

NewSolarisAdmin · January 27, 2009, 3:49pm

The files are about 8000 records or so each. File 1 is 1.6 M, file 2 is 343 k.

FYI every tool you mentioned there are ones I am not too strong with, I can do loops, if thens, and geps pretty well but I never learned awk, and have relatively no experience with using arrays. I know I know it's shameful but I always have to do stuff the hard way... BUT I will take any help I can get, I am a fast learner.

Just so you know I just heard we are in the middle of a snow storm here and they are closing our building early... Sucks to work in a room with no windows, I had no idea! ANYWAY, I will be off the board for a few hours while I make my way home but I will continue on with this once I get there.

Thanks SO MUCH again for all of the help.

Back soon!

joeyg · January 27, 2009, 4:38pm

Two raw files

> cat file151
bob:johnson:3343:bjohn@email_SB.org,bjohn@emailARG.org,3343@email.org
sol:admin:3344:sadmin@email_SB.org,sadmin@emailARG.org,3344@email.org
joe:sample:3345:jsample@email_SB.org,jsample@emailARG.org,3345@email.org
> cat file152
bob:johnson:bjohn@email_SB.org
sol:admin:3344@email.org
joe:sample:jsample@email_SB.org

Script to do what you want.
It uses variables for the filenames, to make easier to modify.

> cat make153.sh
#! /usr/bin/bash

FILE1="file151"
FILE2="file152"
FILE3="file153"

MATCH_EM="@emailARG.org"
savifs=$IFS
IFS=":"

rm ${FILE3} 2>/dev/null   #delete the file, if it exists

while read FNAME LNAME EMAIL
   do
   EMAIL1=`grep "${EMAIL}" ${FILE1} | cut -d":" -f4`
   EMAIL2=`echo ${EMAIL1} | tr "," "\n" | grep "${MATCH_EM}" `
   ID=`grep "${EMAIL}" ${FILE1} | cut -d":" -f3`

   echo "${ID}:${FNAME}:${LNAME}:${EMAIL}:${EMAIL2}"
   echo "${ID}:${FNAME}:${LNAME}:${EMAIL}:${EMAIL2}" >>${FILE3}
done<${FILE2}

The output file created:

> cat file153
3343:bob:johnson:bjohn@email_SB.org:bjohn@emailARG.org
3344:sol:admin:3344@email.org:sadmin@emailARG.org
3345:joe:sample:jsample@email_SB.org:jsample@emailARG.org

Let me know if you have any questions on the coding or commands. Some of them could have been combined or simpified, but I thought keeping in this layout would make it easier for you to follow the logic.

NewSolarisAdmin · January 29, 2009, 2:49pm

Wow man thanks!

Storm was way worse than anyone thought it was going to be, killed my home net connection so I could not get back to this until now.

Thank you so much for the script, and it would work fine if their files were consistent, but upon further examination I have now realized they are not.

The DBA's here need to be kicked in the head, or maybe they already have been, that would explain a lot...

ANYWAY

File2's format is the most consistent so I am going to work off of it, just like in the script you have written. Since I can't assume the 3rd field will be an email, since there are some middle names, some have two last names, some have no spaces, some are Jr's, the list goes on and on... I have no intention of trying to write a condition for all of those situations.

So here is my idea:

I am going to try to script it to search for an email in each line of File2 (ie ":*@*:" or something like that).
Once I've got an email I will search in the other file for a line that has that email somewhere in it.
Once I have that line I will get the 7 digit number (which is the ID #) out of it.
Then I will also search this same line for the email that ends in the specified string.
Then take those two items and add the ID number to the front of the line from file2 and the email ending in the specified string to the end of it and write it back out.

I think this is the best way (and maybe only way) to do it....

I apologize for not realizing that ht was inconsistent before, the DBA said it was and I took their word for it (my mistake there).

Thanks again for trying, I think I have learned what I needed to know from looking at your script, to get this done.

I'll post what I come up with in case it may help someone else in the future.

NewSolarisAdmin · January 29, 2009, 3:15pm

OK I was wrong I don't know all I need to know to do this...

I can use this:

while read F2Line
   do
         .....
            search var F2line for the string that is the email
         .....

done<${FILE2}

to look at file two line by line. At that point I need to search that line (stored in variable name F2Line) which will be an undetermined bunch of strings separated by :'s iin one line for the string containing the email address.

How do I do that?

joeyg · January 29, 2009, 3:23pm

Just line I did, show some of the data in an irregular format for the two files.
Something like the following will make it easier to understand:

> cat file151
bob:johnson:3343:bjohn@email_SB.org,bjohn@emailARG.org,3343@email.org
sol:admin:3344:sadmin@email_SB.org,sadmin@emailARG.org,3344@email.org
joe:sample:3345:jsample@email_SB.org,jsample@emailARG.org,3345@email.org
> cat file152
bob:johnson:bjohn@email_SB.org
sol:admin:3344@email.org
joe:sample:jsample@email_SB.org

NewSolarisAdmin · January 29, 2009, 3:50pm

OK I will try to give some good examples here. With the occasional extra crap bolded.

file151:
bob:johnson:3343:bjohn@email_SB.org,bjohn@emailARG.org,3343@email.org
sol:admin:3344:sadmin@email_SB.org,sadmin@emailARG.org,3344@email.org
joe:sample:JR:3345:jsample@email_SB.org,jsample@emailARG.org,3345@email.org
john:De:Salva:3346:jds@email.com,jDesalva@emailRAG.edu,jSalva@email.com

Now for file152 what you have there is correct but sometimes you get an extra name field so I will add a few for example:
bob:johnson:bjohn@email_SB.org
sol:admin:3344@email.org
joe:sample:jsample@email_SB.org
bill:jones:jr:bjones@email.edu
bill:jones:john:jr:sampson:bjs@emailARG.edu

Things that are safe to assume:
file152:
This will have an email (and only one email) in each line.
This is the email I will need to trim out and use to get the corresponding line out of file151 to work with...

file151:
This will have at least one line with an email in it that will match the email from the file152.
Each line in this file will have a unique 4 digit number that is the Id number somewhere in it. (I need to get that out and into a variable)
Also somewhere in each line of this file there will be an email ending in "something@ARG.edu" (I need to get that out too)

SO once I get the ID and the something@ARG.edu email out of each line in file 151 I will write the line back out to a new file like this:

IDfrom151:entire line from file 152:something@ARG.edu email from 151

follow me? Or do I need to give more examples lines?

Where the other script fails is in the cut commands, they assume field 3 is email when in fact field 4 could be email, if there is a Jr. in the mix. Or field 5 if there is a Jr. and 2 middle names...

joeyg · January 29, 2009, 4:42pm

Something I was thinking about was to insert another character to help delimit the first file. It is not all done, and doesn't yet accomplish everything. But, while I had a few spare moments, thought I'd pass this along as a possible approach.
Experiment with that sed command and see what it does to you input file, and then it will be clear how you can cut with the additional delimiter of ~.
Unsure yet if the same type of approach can be done with the 2nd file.

If my first file now looks like:

> cat file151a
bob:johnson:3343:bjohn@email_SB.org,bjohn@emailARG.org,3343@email.org
sol:admin:3344:sadmin@email_SB.org,sadmin@emailARG.org,3344@email.org
joe:sample:JR:3345:jsample@email_SB.org,jsample@emailARG.org,3345@email.org
john:De:Salva:3346:jds@email.com,jDesalva@emailRAG.edu,jSalva@email.com

Then I can use sed to place a ~ before the first number, and that can help me grab data.
See code below:

> cat make153a.sh
#! /usr/bin/bash

FILE1="file151a"      #1 input file
FILE1x="file151ax"    #1 input file fixed
FILE2="file152a"      #2 input file
FILE2x="file152ax"    #2 input file fixed
FILE3="file153a"      #output file

MATCH_EM="@emailARG.org"

sed "s/:[0-9]/~&/" ${FILE1} >${FILE1x}

savifs=$IFS
IFS=":"

rm ${FILE3} 2>/dev/null   #delete the file, if it exists

while read FNAME LNAME EMAIL
   do
   EMAIL1=`grep "${EMAIL}" ${FILE1x} | cut -d"~" -f2 | cut -d":" -f3`
   EMAIL2=`echo ${EMAIL1} | tr "," "\n" | grep "${MATCH_EM}" `
   ID=`grep "${EMAIL}" ${FILE1x} | cut -d"~" -f2 | cut -d":" -f2`

   echo "${ID}:${FNAME}:${LNAME}:${EMAIL}:${EMAIL2}"
#   echo "${ID}:${FNAME}:${LNAME}:${EMAIL}:${EMAIL2}" >>${FILE3}
done<${FILE2}

NewSolarisAdmin · January 29, 2009, 4:45pm

I follow you. Looking at this I now see how you are using the "tr" command, that will help out a lot too..

I have some ideas I am trying too, let me get something together and I will post it and see what you think.