awk length of digit and print at most right digit

sdf · November 23, 2011, 6:21am

Have columns with digits and strings like:

input.txt

3840 3841 3842 Dav Thun Tax
Cahn 146; Dav.
3855 3853 3861 3862 Dav Thun Tax
2780 Karl VI.,
3873 3872 3872  Dav Thun Tax
3894 3893 3897 3899 Dav Thun Tax
403; Thun 282.
3958 3959 3960  Dav Thun Tax
3972 3972 3972 3975 Dav Thun Tax
Rom. Dav. 145;
4006 4005 4007 Dav Thun Tax

output.txt

3842 Dav Thun Tax
Cahn 146; Dav.
3862 Dav Thun Tax
2780 Karl VI.,
3872  Dav Thun Tax
3899 Dav Thun Tax
403; Thun 282.
3960  Dav Thun Tax
3975 Dav Thun Tax
Rom. Dav. 145;
4007 Dav Thun Tax

Can anybody help me on the code

ygemici · November 23, 2011, 7:05am

if you have gnu sed maybe u can try this

$ sed '1~2s/.* \([^ ]* [^ ]* [^ ]* [^ ]*\)/\1/'

sdf · November 23, 2011, 7:08am

Thanks, though i use gawk on windows i can't use sed.

forroughuse · November 23, 2011, 7:20am

Hi ygemici,

Nice

$ sed '1~2s/.* \([^ ]* [^ ]* [^ ]* [^ ]*\)/\1/'

Can you please explain me how it works/run.

sdf · November 23, 2011, 8:51am

OK got sed to run on gnu it erases a lot of strings. So i will want to run part on code and the rest i will do by hand. I came up with this code

awk '{if(length($1)==4 && $1=="[0-9]" && length($2)==4 && $2=="[0-9]"  && length($3)==4 && $3=="[0-9]" ) print $1,$2,$3}' input.txt to_correct_ouput.txt

The code won't work can anybody help on correcting.

ygemici · November 23, 2011, 8:54am

you can use sed on windows..
sed for Windows

CarloM · November 23, 2011, 9:12am

1~2s for line 1 and every 2nd line after that, substitute
/.* any number of any characters, followed by a space and
\([^ ]* [^ ]* [^ ]* [^ ]*\) (stored sub-pattern) any number of non-space characters followed by a space (*3), followed by any number of non-space characters (i.e. the last 4 fields in the line, space-separated).
/\1/ replace with the text that matched stored sub-pattern 1

ahamed101 · November 23, 2011, 9:26am

Try this...

awk '/^[0-9]/{match($0,"([0-9]*[; ]*[a-zA-Z].*)",a); $0=a[1]}1' input_file

--ahamed

ygemici · November 23, 2011, 9:42am

1~2p says to SED that work on every 2nd line starting from 1. line..
so skip/ignore 2-4-6...lines and works other lines..