[awk] grep a part of filename as entry file

hi all,

i need to combine these files into one csv file.

Bounce_Mail_Event_Daily_Report_01_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_02_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_03_Jul_2012.csv 
Bounce_Mail_Event_Daily_Report_04_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_05_Jul_2012.csv
...
Bounce_Mail_Event_Daily_Report_31_Jul_2012.csv

sample of content file (Bounce_Mail_Event_Daily_Report_02_Jul_2012.csv):

Bounce Mail Daily Event Report
08/24/2012,2:44 PM
Device Hostname,Device Address,Count
server1.example.com,10.65.1.19,5053

the problem is i should get the date from file name as the timestamp and get the 4th line as it entry.

so the result will be:

01_Jul_2012,server1.example.com,10.65.1.19,3083
02_Jul_2012,server1.example.com,10.65.1.19,5053
...
31_Jul_2012,server1.example.com,10.65.1.19,4838

i could not use date from the file (line 2), because it is the date when csv file created not the date as mention on filename.

Does anyone have any suggestions how to do it with awk?

thank you.

Use below code to give date from the file name into the file entry...

As you have not clearly mentioned how you want the output of your csv, You can just add what you want to add(just grep after sed).

for i in *.csv
do
Date_var=$(echo "$i" | awk -F "[_.]" '{ print $6,$7,$8 }' OFS="_" )
sed 's/server1.example.com/'$Date_var',server1.example.com/1'   
done

If you are getting your desired output or need to add some specific data just add after the sed....

sed 's/server1.example.com/'$Date_var',server1.example.com/1'  | grep "erver1.example.com" >> final_output.csv
1 Like
for i in *.csv ; do 
       echo $(basename $i .csv | cut -d_ -f6-),$(tail -1 $i)
done  > output.csv
1 Like

Try (not tested):

awk 'FILENAME!=prevfile{
match(FILENAME,/[0-9]{2}_[A-Z][a-z]{2}_[0-9]{4}/)
dt=substr(FILENAME,RSTART,RLENGTH)
prevfile=FILENAME}
FNR>=4{print dt "," $0}' Bounce_Mail_Event_Daily_Report_??_Jul_2012.csv > Bounce_Mail_Event_Monthly_Report_Jul_2012.csv
1 Like
 
nawk -F"[_.]" 'BEGIN{fdate=substr(FILENAME,length(FILENAME)-14,11)}NR==4{$0=fdate","$0;print}' Bounce_Mail_Event_Daily_Report*.csv

try with awk, if you dont have nawk.

The BEGIN block will be executed before reading the arguments. So, FILENAME will have null value (on many awk implementations including gawk).
The FS is not required.

$ nawk 'BEGIN{print FILENAME}' a.txt
a.txt

FILENAME - The name of the current input file. If no files are specified on the command line,the value of FILENAME is "-". However, FILENAME is undefined inside the BEGIN block (unless set by getline).

so, iterating all the csv files, you can use for loop

 
for i in *.csv
do
nawk -F"[_.]" 'BEGIN{fdate=substr(FILENAME,length(FILENAME)-14,11)}NR==4{$0=fdate","$0;print}' $i
done
1 Like

Cygwin:

$ gawk 'BEGIN{print FILENAME}' a.txt|od -bc

$ gawk --traditional 'BEGIN{print FILENAME}' a.txt

$ gawk --posix 'BEGIN{print FILENAME}' a.txt
<blanks>

On AIX 6.1:

awk 'BEGIN{print FILENAME}' a.txt
a.txt

So, for portable scripts, do not reference FILENAME inside BEGIN .

i dont have gawk. i work in solaris

 
$ nawk 'BEGIN{print FILENAME}' a.txt | od -c
0000000   a   .   t   x   t  \n
0000006
$ awk 'BEGIN{print FILENAME}' a.txt | od -c
0000000   a   .   t   x   t  \n
0000006

awk 'FNR==4{split(FILENAME,a,"_");dt=a[6]"_"a[7]"_"substr(a[8],1,4);print dt","$0}' Bounce*
1 Like

all,

thank you for all your assistants.
my case is solved now :smiley: