awk to find number in a field then print the line and the number

Mudshark · July 31, 2015, 10:33am

Hi

I want to use awk to match where field 3 contains a number within string - then print the line and just the number as a new field.
The source file is pipe delimited and looks something like

1|net|ABC Letr1|1530|||
1|net|EXP_1040 ABC|1121|||
1|net|EXP_TG1224|1122|||
1|net|R_North|1123|||
1|net|RExp 123 456 X|1234|||

What I want as an output is:

1|net|ABC Letr1|1530|||1|
1|net|EXP_1040 ABC|1121|||1040|
1|net|EXP_TG1224|1122|||1224|
1|net|RExp 123 456 X|1234|||123 456|

i.e. field $7 has just the number from field $3
I got as far as matching where field 3 contains a number:
where the input file is called 'x'

cat x|gawk -F"|" '$3 ~ /[0-9]/'

but various attempts with substr have so far failed.
any help appreciated . .

thanks

RavinderSingh13 · July 31, 2015, 10:44am

Hello Mudshark,

Following may help you in same.

awk -F"|" '{Q=$3;gsub(/[[:alpha:]]|[[:punct:]]/,X,Q);sub(/^[[:space:]]/,X,Q);sub(/[[:space:]]+$/,X,Q);$(NF)=Q;if(Q){print $0 OFS}}' OFS="|" Input_file

Output will be as follows.

1|net|ABC Letr1|1530|||1|
1|net|EXP_1040 ABC|1121|||1040|
1|net|EXP_TG1224|1122|||1224|
1|net|RExp 123 456 X|1234|||123 456|

EDIT: Adding a non-one liner solution for same.

 awk -F"|" '{
                Q=$3;
                gsub(/[[:alpha:]]|[[:punct:]]/,X,Q);
                sub(/^[[:space:]]/,X,Q);
                sub(/[[:space:]]+$/,X,Q);
                $(NF)=Q;
                                                        if(Q){
                                                                print $0 OFS
                                                             }
            }
          ' OFS="|" Input_file

Thanks,
R. Singh

RudiC · July 31, 2015, 12:34pm

Try also

awk -F\| '{$7=$3; $8=""; gsub(/^[^0-9]*|[^0-9]*$/, "",$7)} $7' OFS=\| file
1|net|ABC Letr1|1530|||1|
1|net|EXP_1040 ABC|1121|||1040|
1|net|EXP_TG1224|1122|||1224|
1|net|RExp 123 456 X|1234|||123 456|

MadeInGermany · July 31, 2015, 1:22pm

Ravinder, you worked around

gsub(/[^[:digit:]]/,X,Q)

Don_Cragun · July 31, 2015, 6:18pm

This works with the English description of the desired output, but doesn't preserve the space between strings of digits as shown in the last line of the desired output. To get the desired output, one could use:

gsub(/[^[:digit:] ]/,X,Q)

if only <space> characters are to be allowed between strings of digits, or:

gsub(/[^[:digit:][:space:]]/,X,Q)

if any combinations of <space> and <tab> characters are to be allowed (as assumed by Ravinder's following sub() statements to get rid of leading and trailing space class characters.

Note that Ravinder's suggestion to use:

                sub(/^[[:space:]]/,X,Q);
                sub(/[[:space:]]+$/,X,Q);

after getting rid of alphabetic and punctuation class characters only removes one leading whitespace character (when it seems that there could be more than one). And, both of these could be replaced by a single gsub() call:

                gsub(/^[[:space:]]+|[[:space:]]+$/,X,Q);

or, if only space characters are to be allowed between digit strings:

                gsub(/^ +| +$/,X,Q);

Mudshark · July 31, 2015, 6:35pm

Many thanks all for your help on this.
Suggestions both worked perfectly on my example data.

appreciate the help.

Thanks Don for the explanations.