How Select numbers from a line of text, and remove leading spaces?

I have a text file with a line of text that contains numbers and text formatted into groups. I need to extract the number that can be either 1,2 or 3 digits long. Then write it to a variable, but i need to remove any leading spaces in the number first.

I can get the numbers out but how to remove the leading spaces?

My test file contents

"C 56","Home athletics center","LAC"
"D001","50M Run","50M"
"D003","100M Run","100M"

My Script so far that will pull out the 3 digit number after the first C, and write it to a variable centnum
this is only part of the entire script as the rest will locate only the first line containing the neccessary value

#!/bin/bash
#set initial variables
file="FILE.EXT"     # input filename

while read line
 do centnum=${line:2:3}                    # find 3 Chars in line
      centnam=$(echo $line | cut -d '"' -f4)    #Extract the Center Name from the file 

done <"$file"
echo D';'$centnum';'$centnam';;M;;' > MM_file.txt

the number in the output file needs to be just the significant digits and is a CSV so can be variable length.

I have tried using sed, but this just parses the entire file at once and I cannot process each line individually.

TIA
Ken

#!/bin/bash

file="FILE.EXT"

while read line
do
    centnum=${line:2:3}
    centnam=$(echo $line | cut -d '"' -f4)
    echo "D;$(($centnum));$centnam;;M;;"
done <  $file

gives

D;56;Home athletics center;;M;;
D;1;50M Run;;M;;
D;3;100M Run;;M;;
1 Like
awk -F "[^0-9]*" '{print $2,$3,$4}'
56
001 50 50
003 100 100
1 Like

Thanks and that almost works. I get the 1, 2 or 3 numbers as desired but when I write them out to a resulting file using this line, I now get a space after the numbers.

echo D';'$centnum';'$centnam';;M;;' > MM_file.txt
D;156 ;Home athletics center;;M;;

You could also try something like:

#!/bin/bash
#set initial variables
file="FILE.EXT"     # input filename

while IFS='"' read -r x cn x centnam x
do      cn=${cn##*[!1-9]} # strip off leading non-digits and zeros.
        centnum=${cn:-0} # Add a zero if the input had no non-zero digits.
        printf "D;%s;%s;;M;;\n" "$centnum" "$centnam" 
done < "$file" > MM_file.txt

This does all of the processing using shell built-ins (no need to invoke cut or awk) and will work even if $centnam expands to a string containing multiple adjacent whitespace characters.

Hi Don.. Interesting idea but The 2 variables will be written from data withing the text file, Specifically the first line.
Suppose some more context might help, but i was trying to keep it simple :-).

I will have many files to process and all are the same format, and I will need to extract this line out, and then read the Center Number and name, then write them out to another file.

Another part of the script will read the input file and this part only needs to extract the Number and name from this line. The number wil be any number between 1 and 999, and the leading zero is the problem.

Ken

Your script writes:

D;003;100M Run;;M;;

into the file MM_file.txt from the last (not first) line of your input file FILE.EXT which you said contains:

"C 56","Home athletics center","LAC"
"D001","50M Run","50M"
"D003","100M Run","100M"

At various times you have said that the leading zeros are a problem and that the leading space is a problem. You have not shown us what output you want to be produced. My script processed each line in your input file, stripping off leading spaces and leading zeros and putting:

D;56;Home athletics center;;M;;
D;1;50M Run;;M;;
D;3;100M Run;;M;;

in MM_file.txt which is one line of output for each line of input. If this is not what you want, please explain clearly what output you do want in English and show us the output you want to have produced with the sample input file you provided. (Note that I see no way to get the output you showed in message #4 in this thread:

D;156 ;Home athletics center;;M;;

since 156 does not appear as a number on any of your sample input lines???)

the original file had " C 56 " as the line but to test I also tried the other possibilities
" C 56 " , " C 6 ", " C156 "
Each iteration gave a following space in the outputted file and I just quoted the output from one of the other tests

It is only the first line I need to parse. The second and subsequent lines will be dealt with in another part of the script. I probably should not have included them as it unnecessarily confused the requirements.

The part of the input file I need to parse is

"C 56","Home athletics center","LAC"

The output I require is to be

D;56;Home athletics center;;M;;

As i mentioned, the code given in the first reply works, but adds a space after the numbers

Ken

So, with you new statement of what you want, try:

#!/bin/bash
#set initial variables
file="FILE.EXT"     # input filename

IFS='"' read -r x cn x centnam x < "$file"
cn=${cn##*[!1-9]} # strip off leading non-digits and zeros.
centnum=${cn:-0} # Add a zero if the input had no non-zero digits.
printf "D;%s;%s;;M;;\n" "$centnum" "$centnam" > MM_file.txt

PS Note that although your current input has 1, 2, or 3 digit numbers; the code above will work with numbers that are 1 or more digits long (as long as your total input line length is less than LINE_MAX bytes long for whatever value LINE_MAX has on your system).

1 Like

Excellent Thanks that works.:b:

Now all I need to do is review it in detail to understand how it works and hopefully remember it, but that is a task for me.

Re reading back thru the thread I realised the first reply also has a working Output if the Output line enclosed the Variable in a another set of brackets.

echo "D;$(($centnum));$centnam;;M;;" > MM_file.txt

I assume you knwo why, but can you explain why that format removes the leading space?

Thanks again
Ken

It's converting the variable into a numeric value, then operates on it. BTW, you don't need the "$" then:

echo ">"$((centnum))"<"
>56<

Cool Thank you all for your help.:b:
I am quite amazed that so many replies on a sunday

Ken

The arithmetic expansion of $(($x)) when $x expands to the string " 56" is 56 .
Similarly, the parameter expansion of ${x##*[!1-9]} (which removes everything in the string before the 1st non-zero digit) when $x expands to the string "c 56" is also 56 .

1 Like