How to cut by delimiter, and delimiter can be anything except numbers?

Hi all,

I have a number of strings like below:

//mnt/autocor/43�13'(33")W/[N]

and i'm trying to get the numbers in this string, for example
43[tab]13[tab]33[tab]

please help

thanks ahead

You probably don't want to use cut, it's fast but inflexible (to the best of my knowledge you cant have a variable delimiter). try the following approach:

perl -e 'print join("\t",(split(/[^0-9]+/,$ARGV[0]))),"\n" ;' '//mnt/autocor/43�13(33")W/[N]'

ooops, re-read your question, the following puts a tab on the end of the line also

 perl -e 'print join("\t",(split(/[^0-9]+/,$ARGV[0])),"\n") ;' '//mnt/autocor/43�13(33")W/[N]'
1 Like
perl -ne 'while(/(\d+)/g){print "$1\t"}' inputfile
1 Like
~/unix.com$ awk -F'[^0-9]' '{s="";for(i=1;i<=NF;i++){if($i!="")s=s$i"\t"}print s}' file
1 Like
tr -cs '[:digit:]' '[\t*]'
1 Like

Someone, tell me what's wrong with this code, please?

awk -F'[^0-9]' '{s="";i=0;while(++i<=NF&&$i!="")s=s$i"\t"}$0=s' file
1 Like

The while loop will abort at the first empty field because $i != "" evaluates to false. With the sample data provided and with the field separator you're using (any non-digit), the comparision is false for $1 and the body of the loop never executes.

Regards,
Alister

1 Like

Now it's so obvious that I feel stupid :stuck_out_tongue:
Thank you Alister :slight_smile:

It happens to everyone from time to time. Sometimes, when debugging, we see what we expect instead of what's actually there.

Regards,
Alister

@alister

I am rather sure you can help me to have a better understanding of this notation:
Is the wildcard mandatory ? What is it used for ?
(i mean isn't the use of the -s option of tr sufficient and intended for that gathering purpose?) or are there some other reasons to prefer that notation rather than a simple '\t' ?

Thanks in advance

yas:

grep -Eo '[0-9]+' infile | paste - - -

Analogous to the nice tr statement:

sed 's/[^0-9][^0-9]*/\t/g' infile

@Scruti,

I am just wondering about the wildcard in the '[\t*]' notation, why not to just use '\t' instead ? Is that wildcard just a litteral one (part of the [ ] list), or is it interpreted ?

That notation as I used it is not so much a wildcard as a repetition operation. It says to pad the second set with the preceding characer, \t, until the second set's length equals the first's (a number can follow the asterisk to indicate an exact count, [\t*3] would include three tabs in the set).

Why bother with that? Historically, BSD and SysV tr implementations behaved differently when the second set is shorter than the first. This notation guarantees that this does not occur.

In practice, at least with open source systems, you'll probably never run into this issue.

For more info, see the POSIX man page for tr. Specifically, the EXTENDED DESCRIPTION section for the details of the syntax and APPLICATION USAGE for the history.

Regards,
Alister

1 Like

The notation with the square brackets and the asterisk makes sure that the number of characters in the second string is equal to the length of the first string. This is the most portable way. If you use a single character for the second string, then it is not guaranteed to work across all implementations...

1 Like

@Scruti & alister

Dudes, well ... I confess i didn't read all the Posix documentations so far...

Anyway, I will sleep a little less ignorant ! :wink: Thx !