awk not behaving as expected

maks475 · October 13, 2013, 2:26pm

Hi,

Immediate help on below will be appreciated.

I have to read a file (max of 10MB) which will have no new line characters, i.e. data in single line. and have to inster '\n' at every 100 characters. and if record starts with 'BUCA' then need to pick value of length 10 at position 71 and write it to a file otherwise drop the record.

I tried below, working fine up to inserting '\n' after every 100 characters but sending all the records into target file.

awk 'BEGIN { len = 100 } length <=100  { print; next }; { print substr($0,1,len); $0 = substr( $0, len+1); while ( length >= 100 ) {; printf "%s\n", substr($0,0,len); $0 = substr( $0, len+1) }; if ( length ) {; if( substr($0, 1, 4) ~ /"BUCA"/ ) {; print substr($0, 71, 10) } } }' "src_file.txt" > tgt_file.txt

(No new lines in the above script, only single line)
Also taking too much time for processing 10MB file (more than 5mins).

Don't know where I did mistake, Please correct me.

Thanks In Advance.

bartus11 · October 13, 2013, 2:29pm

Please provide sample input (consisting of a few hundred characters) and desired output.

maks475 · October 13, 2013, 3:22pm

IN file:

BUCH20131013                                                          4748420001                    BUCA20131013                                                          4748520001                    BUCH20131013                                                          4748420002                    BUCA20131013                                                          4748520002                    BUCH20131013                                                          4748420001                    BUCA20131013                                                          4748520003                    BUCH20131013                                                          4748420001                    BUCA20131013                                                          4748520004                    BUCH20131013                                                          4748420001                    BUCA20131013                                                          4748520005

(no new lines in the above data)
Out file:

(output with new line after every 100 characters and pick 10 characters starting at position 71 from every record where record should start with "BUCA")

RudiC · October 13, 2013, 4:05pm

fold -w100 file
BUCH20131013 4748420001 BUCA20131013 4748520001 BUCH20131013 4748420002 BUCA20131013 4748520002 BUCH
20131013 4748420001 BUCA20131013 4748520003 BUCH20131013 4748420001 BUCA20131013 4748520004 BUCH2013
1013 4748420001 BUCA20131013 4748520005

I'm sure I don't understand how you want to produce your desired output.

Edit: Now that code tags have been applied to your input data, it's a different story - pipe the fold result through a grep or awk cmd to filter out the BUCA lines and get your (obviously) second field value.

bartus11 · October 13, 2013, 4:17pm

Please use code tags for input as well as output data.

Scrutinizer · October 13, 2013, 6:35pm

Try:

gawk 'NR>1{print $2}' RS=BUCH file

paresh_n_doshi · October 14, 2013, 2:27am

try this awk ( stored in a file)

{rec=$0;endthis=1;this=1;len=length(rec)/2}
END { while (endthis<len+1)
printf("%s\n",substr(rec,this,100));
this+=100;endithis++;}}

CarloM · October 14, 2013, 12:10pm

Does it have to be in awk?

$ fold -w100 src_file.txt | grep ^BUCA | cut -c71-80
4748520001
4748520002
4748520003
4748520004
4748520005