Find Data in test file and write each out to a line

I have a .csv file that has been create from a google form and I need to extract the data from it that has been entered by users.
The CSV will have anywhere between 100 and 1000 lines which comprise entr data for a sports carnival

A few typical line is shown here to show the problem I have

 26/10/2015 11:58:26,KL,,22,name1,name2,F,9,,,,,,,,,70,100,200,400,,,,, 
26/10/2015 12:01:28,KL,,22,name3,name4,F,8,70,100,200,700s,,,,,,,,,,,,,
26/10/2015 12:01:28,KL,,22,name5,name6,F,8,,,,,,,,,,,,,,HJ,LJ,DT,SP,

From this data, I need to extract the data in the last part of the lines,
This data can be Alpha or numeric or combination of and between 2-4 chanractes in length. It could be in any position within the last blank fields.
there could be anywhere between 1 and 6 entries on each line that ineed to capture.

For each entry, I need to write out a single formatted line which I already have most of in the form

D;Name2;name1;;F;;KIN;KL;9;;70;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;100;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;200;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;400;;M;1;22;;;;

How do I capture to a variable, The Last data on each line from random positions and then proceed to the next line when done?

I am stumped!

Thanks in advance
Ken

---------- Post updated at 04:20 PM ---------- Previous update was at 03:51 PM ----------

I had thought of using cut to query the first Column to test ( cut -d , -f9 ), if it has a non zero value, then must have data so write line, and then increment column and test again.
If value = LF? then proceed to next line

is there a better way that interativley testing each column?

Ken

You seem to want to use awk to produce one line of output as you loop through $10 through $NF on each line of your input file. But I have no idea where some of the data you are showing in your sample output comes from. Your output contains "D", "Name2", "KIN", "M", and "1" that don't appear anywhere in the 1st line of your input.

Or, you could use the shell to read lines setting positional parameters or an array variable to the values found on each input line and loop through positional parameters 10 through $# or array elements 9 through ${#array}-1.

Without knowing what operating system and shell you're using, it is hard to make any specific suggestions that might work in your environment.

Hi Don
The extra data you mention is hardcoded in the output format, or comes from the input line which I have already, written to variables as i need.
No issue with that it is just how i get the data from the line quoted and write to a file for each one

My OS linux and Bash is my shell

Just to clarify, i have highlighted in red the part of the line i need to extract from and to write each of these in turn to a single variable ( ie event)

for the line below, i need to write 4 individual lines to the output file, one for each event.

26/10/2015 11:58:26,KL,,22,name1,name2,F,9,,,,,,,,,70,100,200,400,,,,, 

And I need to write That data output into several lines as shown below

D;Name2;name1;;F;;KIN;KL;9;;70;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;100;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;200;;M;1;22;;;;
D;Name2;name1;;F;;KIN;KL;9;;400;;M;1;22;;;;

The rest of the output line is already held in variables and hard coded formatting and is fine.

Note that the data data can be anywhere within those columns.
Also note that there could be anywhere between 1 and 6 events in the input line.

Ken

Saying you have some variables without telling us what language you're using doesn't really help us much. Telling us you have some variables without telling us the variable names doesn't help us much. Please help us help you.

Tell us what operating system you're using.

Tell us what shell you're using.

Show us the code you're currently using to print everything except the field you have shown in red.

Then we can help you fill in the data you're shown in red.

As mentioned in the previous reply
OS = Linux, Bash Shell

Part of my existing code below which pulls my required variables out already

#**********************************************************************
#Main Proram code
#**********************************************************************

while read line 
    do
        echo $line
        echo
        cname=$(echo $line | cut -d, -f2)        # Extract Centre Short name
            IFS="," read X csname centnum < <(grep "^${cname}," centers.txt) # "read the Center Name and ShortName from  External file  eg $centnum,$cname,$csname
        athnum=$(echo $line | cut -d, -f4)          # Extract Athlete Number
        firstname=$(echo $line | cut -d, -f5)         # Extract Athlete First Name
        lastname=$(echo $line | cut -d, -f6)         # Extract Athlete Last Name
        gender=$(echo $line | cut -d, -f7)         # Extract Athlete Gender
        age=$(echo $line | cut -d, -f8)         # Extract Athlete Age    
            
#extract event numbers from line


        echo D';'$lastname';'$firstname';;'$gender';;'$csname';'$cname';'$age';;'$event';;M;1;'$athnum';;;;' >> $outfile    
    
    done <"$file"                                # specifly the import file to work on

The only bit i am having an issue with is writing the indivdual event codes in turn to a variable so I can write my output line

Thought to do something like recusively check each colun in the input file and test to see if that is valid data, and if so write out the the line. Every check would have to check for the CR to find endof line and then go to nex one

column = 9
until [ some Test for end of line ] #Test for end of line
    echo "Column to test is" $column

    do 
        event=$(echo $line | cut -d, -f$column)  # Event that athlete has entered
        echo $event
        ((column++))
    done
done

yes I know this does not work as is, but is this an efficient ( and even possible way to do it or is there some better methods?

Ken

I have to second Don Cragun in that your requirement is far from clear, but with some assumptions I inferred this from your posts:

while read A cname B athnum firstname lastname gender age REST
  do    for event in $REST
          do    [ $event ] && echo D';'$lastname';'$firstname';;'$gender';;'$csname';'$cname';'$age';;'$event';;M;1;'$athnum';;;;'      
          done
  done < file
D;name2;name1;;F;;;KL;9;;70;;M;1;22;;;;
D;name2;name1;;F;;;KL;9;;100;;M;1;22;;;;
D;name2;name1;;F;;;KL;9;;200;;M;1;22;;;;
D;name2;name1;;F;;;KL;9;;400;;M;1;22;;;;
D;name2;name1;;F;;;KL;9;; ;;M;1;22;;;;
D;name4;name3;;F;;;KL;8;;70;;M;1;22;;;;
D;name4;name3;;F;;;KL;8;;100;;M;1;22;;;;
D;name4;name3;;F;;;KL;8;;200;;M;1;22;;;;
D;name4;name3;;F;;;KL;8;;700s;;M;1;22;;;;
D;name6;name5;;F;;;KL;8;;HJ;;M;1;22;;;;
D;name6;name5;;F;;;KL;8;;LJ;;M;1;22;;;;
D;name6;name5;;F;;;KL;8;;DT;;M;1;22;;;;
D;name6;name5;;F;;;KL;8;;SP;;M;1;22;;;;

Does that come somewhere close to what you need?

Thanks Rudi
I am unsure of how to make the requirement clearer so will include the entire file, script and output format sample.

Getting to your code snippet, how does the $event get populated from the original data? and I assume it must be added as a nested while loop?

the latest file data that I have and full script which was what i was trying to avoid

My input file is made up of up 1000 lines similar to this. (note the data has been anonymised, but is valid in every way

26/10/2015 11:58:26,Clubname,,22,FirstName1,SecondName1,F,9,,,,,,,,,,70,100,200,400
26/10/2015 11:58:26,Clubname,,23,FirstName2,SecondName2,F,8,70,100,200,400,,,,,,,,,
26/10/2015 11:58:26,Clubname,,24,FirstName3,SecondName3,F,11,,,,,,,,,,70,100,700s,DT,SP.JT,,,,,,

My full script so far ( with RudiC's )addition is here

#set initial variables
# ask for the input filename from Zenity
file=`zenity --file-selection --title="Select a Wardells Input File"`
case $? in
         0)
                echo "\"$file\" selected.";;
         1)
                echo "No file selected.";;
        -1)
                echo "An unexpected error has occurred.";;
esac
name=${file%.*}            # finds the Name part of the input filename
fileevt="$name.evt"     # Output filename for event listing
outfile="$name.txt"        # Output filenme

echo "the input file name to use will be $file"
echo $name
echo $outfile

centnum=53                    # Center Number
cname="Center Name"            # Center Name
csname="Center short Name"            # Center Short Name
lastname=Poole                # Athlete Last Name
firstname=Natalie            # Athlete RFirst Name
gender=F                        # Athlete Sex
age=15                        # Athlete Age Group
athnum=1                    # Athlete Number

event=100                    # Event Code
mmcodes=1                    # have MM codes been done?

rm -f $outfile                # Delete Output file


#**********************************************************************
#Main Program code
#**********************************************************************

while read line 
    do
        echo $line
        echo
        cname=$(echo $line | cut -d, -f2)        # Extract Centre Short name
            IFS="," read X csname centnum < <(grep "^${cname}," centers.txt) # "read the Center Name and ShortName from  External file  eg $centnum,$cname,$csname
        athnum=$(echo $line | cut -d, -f4)          # Extract Athlete Number
        firstname=$(echo $line | cut -d, -f5)         # Extract Athlete First Name
        lastname=$(echo $line | cut -d, -f6)         # Extract Athlete Last Name
        gender=$(echo $line | cut -d, -f7)         # Extract Athlete Gender
        age=$(echo $line | cut -d, -f8)         # Extract Athlete Age    

echo Name = $firstname
echo Surname = $lastname

#extract event numbers from line
while read A cname B athnum firstname lastname gender age REST
  do    for event in $REST
         do    [ $event ] && echo D';'$lastname';'$firstname';;'$gender';;'$csname';'$cname';'$age';;'$event';;M;1;'$athnum';;;;'      
         done
        done
    done < "$file"                                # specify the import file to work on

    #    echo D';'$lastname';'$firstname';;'$gender';;'$csname';'$cname';'$age';;'$event';;M;1;'$athnum';;;;' >> $outfile    

My output file generated by the script needs to be ( for this input file)

D;SecondName1;FirstName1;;F;Clubname;;KL;9;;70;;M;1;22;;;;
D;SecondName1;FirstName1;;F;Clubname;;KL;9;;100;;M;1;22;;;;
D;SecondName1;FirstName1;;F;Clubname;;KL;9;;200;;M;1;22;;;;
D;SecondName1;FirstName1;;F;Clubname;;KL;9;;400;;M;1;22;;;;
D;SecondName2;FirstName2;;F;Clubname;;KL;9;;70;;M;1;23;;;;
D;SecondName2;FirstName2;;F;Clubname;;KL;9;;100;;M;1;23;;;;
D;SecondName2;FirstName2;;F;Clubname;;KL;9;;200;;M;1;23;;;;
D;SecondName2;FirstName2;;F;Clubname;;KL;9;;400;;M;1;23;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;70;;M;1;24;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;100;;M;1;24;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;700s;;M;1;24;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;DT;;M;1;24;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;SP;;M;1;24;;;;
D;SecondName3;FirstName3;;F;Clubname;;KL;11;;JT;;M;1;24;;;;

So - what's your question?

---------- Post updated at 15:03 ---------- Previous update was at 14:58 ----------

This what I get from my proposal and your latest data:

D;SecondName1;FirstName1;;F;;;Clubname;9;;70;;M;1;22;;;;
D;SecondName1;FirstName1;;F;;;Clubname;9;;100;;M;1;22;;;;
D;SecondName1;FirstName1;;F;;;Clubname;9;;200;;M;1;22;;;;
D;SecondName1;FirstName1;;F;;;Clubname;9;;400;;M;1;22;;;;
D;SecondName2;FirstName2;;F;;;Clubname;8;;70;;M;1;23;;;;
D;SecondName2;FirstName2;;F;;;Clubname;8;;100;;M;1;23;;;;
D;SecondName2;FirstName2;;F;;;Clubname;8;;200;;M;1;23;;;;
D;SecondName2;FirstName2;;F;;;Clubname;8;;400;;M;1;23;;;;
D;SecondName3;FirstName3;;F;;;Clubname;11;;70;;M;1;24;;;;
D;SecondName3;FirstName3;;F;;;Clubname;11;;100;;M;1;24;;;;
D;SecondName3;FirstName3;;F;;;Clubname;11;;700s;;M;1;24;;;;
D;SecondName3;FirstName3;;F;;;Clubname;11;;DT;;M;1;24;;;;
D;SecondName3;FirstName3;;F;;;Clubname;11;;SP.JT;;M;1;24;;;;

Note the "KL" field is NOT found in your data.

You can't get an exit code of -1 from a process run by the shell.

Your comments about the format of center.txt described in your comments and the format of center.txt implied by the comments in your code do not match.

The output format you said you wanted and the echo statements in your code do not match. I don't think the echo statements in the code below are correct either, but I hope they are close.

And having two nested loops reading from the same file can't work.

Perhaps the following will come closer to what you want???

#!/bin/bash
# set initial variables
# ask for the input filename from Zenity
file="$(zenity --file-selection --title="Select a Wardells Input File")"
case $? in
(0)	echo "\"$file\" selected.";;
(1)	echo "No file selected."
	exit 1;;
(*)	echo "An unexpected error has occurred."
	exit 2;;
sac

IFS=","
name=${file%.*}	# finds the Name part of the input filename
fileevt="$name.evt"	# Output filename for event listing
outfile="$name.txt"	# Output filenme

echo "the input file name to use will be $file"
echo $name
echo $outfile

mmcodes=1		# have MM codes been done?  Not used in this script...

rm -f "$outfile"	# Delete Output file

#**********************************************************************
#Main Program code
#**********************************************************************
#extract event numbers from line
while read A cname B athnum firstname lastname gender age REST
do	echo "Name = $firstname"
	echo "Surname = $lastname"
	# read the Center Name and ShortName from  External file.  format is:
	#	CenterName,CenterShortName,CenterNumber
	read X csname centnum < <(grep "^${cname}," centers.txt)

	for event in $REST
	do	[ ! "$event" ] && continue
		echo "D;$lastname;$firstname;;$gender;$cname;;$csname;$centnum;$age;;$event;;M;1;$athnum;;;;"      
		echo "D;$lastname;$firstname;;$gender;$cname;;$csname;$centnum;$age;;$event;;M;1;$athnum;;;;" >> "$outfile"
	done
done < "$file"		# specify the import file to work on
1 Like

[quote=don cragun;302958814]
You can't get an exit code of -1 from a process run by the shell.
Ok thanks

And having two nested loops reading from the same file can't work.
Ok thanks That was one of my origfinal concerns

Perhaps the following will come closer to what you want???
Very close Thank you

2 Questions.
When I run the code as you posted, the last line of each section is split into 2 if the input line has a space after the last event, it adds an extra line with no $event code in it

how does the script select only the event?

Without seeing the split output lines and the input line that led to that output, I don't see any obvious reason why the last line would be split.

If you change the line in the in the script:

	do	[ ! "$event" ] && continue

to:

	do	[ ! "${event%[[:space:]]}" ] && continue

it will ignore any event that is just a <space> or <tab> character.

is that correct format? adding that gives an error on Line 39 @ 40 ( line 39 is the file input line)
./test: line 39: unexpected EOF while looking for matching `"'

Sorry, I missed a closing brace:

	do	[ ! "${event%[[:space:]]}" ] && continue

I'll correct it in my earlier post.

1 Like

Awesome thanks That fixed that particular problem

The only issue now is the lines that have no trailing comma. in that case the line is split after the $event code.

D;SecondName1;FirstName1;;F;Clubname;;CLB;900;9;;200;;M;1;22;;;;
D;SecondName1;FirstName1;;F;Clubname;;CLB;900;9;;400
;;M;1;22;;;;

I added an extra blank column in the original spreadsheet, but then would have to remember to do that every time i refreshed the data

Ken

Hi Ken,
Please show us the output from:

od -c file

for the part of the file has the line ending with:

200,400...

that is leading to the three lines of output you showed us in post #14. We need to see what is between the "400" and the <newline> character at the end of that line.

This is for a 2 line input file that shows the error for the first line only
I assume because there are no blanks in the line directly before the return?

ken@Leia:~/Documents/Athletics/convertscript-Googleforms-MMv4$ od -c 3aZoneentry.csv
0000000   2   6   /   1   0   /   2   0   1   5       1   1   :   5   8
0000020   :   2   6   ,   C   l   u   b   n   a   m   e   ,   ,   2   2
0000040   ,   F   i   r   s   t   N   a   m   e   1   ,   S   e   c   o
0000060   n   d   N   a   m   e   1   ,   F   ,   9   ,   ,   ,   ,   ,
0000100   ,   ,   ,   ,   ,   7   0   ,   1   0   0   ,   2   0   0   ,
0000120   4   0   0  \r  \n   2   6   /   1   0   /   2   0   1   5    
0000140   1   1   :   5   8   :   2   6   ,   C   l   u   b   n   a   m
0000160   e   ,   ,   2   4   ,   F   i   r   s   t   N   a   m   e   3
0000200   ,   S   e   c   o   n   d   N   a   m   e   3   ,   F   ,   1
0000220   1   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   7   0   ,   1   0
0000240   0   ,   7   0   0   s   ,   D   T   ,   S   P   ,   J   T   ,
0000260   ,   ,   ,   ,   ,  \r  \n
0000267

OK. We need to get rid of the <carriage-return> at the ends of the lines in your input file... So, change:

	for event in $REST
	do	[ ! "$event" ] && continue

to:

	REST="${REST%[[:space:]]}"
	for event in $REST
	do	[ ! "${event%[[:space:]]}" ] && continue

I'm guessing that just:

	REST="${REST%[[:space:]]}"
	for event in $REST
	do	[ ! "$event" ] && continue

will be enough, but if there is a trailing whitespace character at the end of any event (in addition to the <carriage-return> at the end of the line, removing the trailing whitespace character from each event will take care of that. (And, if there aren't any trailing whitespace characters at the end of individual events, trying to remove a trailing whitespace character won't hurt.)

Oops. Actually, the middle segment above should be:

	REST="${REST%[[:space:]]}"
	for event in $REST
	do	event="${event%[[:space:]]}"
		[ ! "$event" ] && continue
1 Like

That is excellent Don
Thank you for your help
Much simpler than what I was envisaging.

All I have to do now is to understand it all so I can embed it in my brain for nest time

Ken

1 Like

I'm glad to hear it is finally working for you.

Let us know if you can't figure out what any of those shell statements are doing.

  • Don