[awk] grep a part of filename as entry file

makan · September 2, 2012, 11:47pm

hi all,

i need to combine these files into one csv file.

Bounce_Mail_Event_Daily_Report_01_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_02_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_03_Jul_2012.csv 
Bounce_Mail_Event_Daily_Report_04_Jul_2012.csv
Bounce_Mail_Event_Daily_Report_05_Jul_2012.csv
...
Bounce_Mail_Event_Daily_Report_31_Jul_2012.csv

sample of content file (Bounce_Mail_Event_Daily_Report_02_Jul_2012.csv):

Bounce Mail Daily Event Report
08/24/2012,2:44 PM
Device Hostname,Device Address,Count
server1.example.com,10.65.1.19,5053

the problem is i should get the date from file name as the timestamp and get the 4th line as it entry.

so the result will be:

01_Jul_2012,server1.example.com,10.65.1.19,3083
02_Jul_2012,server1.example.com,10.65.1.19,5053
...
31_Jul_2012,server1.example.com,10.65.1.19,4838

i could not use date from the file (line 2), because it is the date when csv file created not the date as mention on filename.

Does anyone have any suggestions how to do it with awk?

thank you.

pamu · September 3, 2012, 2:09am

Use below code to give date from the file name into the file entry...

As you have not clearly mentioned how you want the output of your csv, You can just add what you want to add(just grep after sed).

for i in *.csv
do
Date_var=$(echo "$i" | awk -F "[_.]" '{ print $6,$7,$8 }' OFS="_" )
sed 's/server1.example.com/'$Date_var',server1.example.com/1'   
done

If you are getting your desired output or need to add some specific data just add after the sed....

sed 's/server1.example.com/'$Date_var',server1.example.com/1'  | grep "erver1.example.com" >> final_output.csv

mirni · September 3, 2012, 2:16am

for i in *.csv ; do 
       echo $(basename $i .csv | cut -d_ -f6-),$(tail -1 $i)
done  > output.csv

elixir_sinari · September 3, 2012, 2:16am

Try (not tested):

awk 'FILENAME!=prevfile{
match(FILENAME,/[0-9]{2}_[A-Z][a-z]{2}_[0-9]{4}/)
dt=substr(FILENAME,RSTART,RLENGTH)
prevfile=FILENAME}
FNR>=4{print dt "," $0}' Bounce_Mail_Event_Daily_Report_??_Jul_2012.csv > Bounce_Mail_Event_Monthly_Report_Jul_2012.csv

itkamaraj · September 3, 2012, 3:06am

 
nawk -F"[_.]" 'BEGIN{fdate=substr(FILENAME,length(FILENAME)-14,11)}NR==4{$0=fdate","$0;print}' Bounce_Mail_Event_Daily_Report*.csv

try with awk, if you dont have nawk.

elixir_sinari · September 3, 2012, 3:13am

The BEGIN block will be executed before reading the arguments. So, FILENAME will have null value (on many awk implementations including gawk).
The FS is not required.

itkamaraj · September 3, 2012, 4:03am

$ nawk 'BEGIN{print FILENAME}' a.txt
a.txt

FILENAME - The name of the current input file. If no files are specified on the command line,the value of FILENAME is "-". However, FILENAME is undefined inside the BEGIN block (unless set by getline).

so, iterating all the csv files, you can use for loop

 
for i in *.csv
do
nawk -F"[_.]" 'BEGIN{fdate=substr(FILENAME,length(FILENAME)-14,11)}NR==4{$0=fdate","$0;print}' $i
done

elixir_sinari · September 3, 2012, 4:08am

Cygwin:

$ gawk 'BEGIN{print FILENAME}' a.txt|od -bc

$ gawk --traditional 'BEGIN{print FILENAME}' a.txt

$ gawk --posix 'BEGIN{print FILENAME}' a.txt
<blanks>

On AIX 6.1:

awk 'BEGIN{print FILENAME}' a.txt
a.txt

So, for portable scripts, do not reference FILENAME inside BEGIN .

itkamaraj · September 3, 2012, 4:11am

i dont have gawk. i work in solaris

 
$ nawk 'BEGIN{print FILENAME}' a.txt | od -c
0000000   a   .   t   x   t  \n
0000006
$ awk 'BEGIN{print FILENAME}' a.txt | od -c
0000000   a   .   t   x   t  \n
0000006

raj_saini20 · September 3, 2012, 6:57am

awk 'FNR==4{split(FILENAME,a,"_");dt=a[6]"_"a[7]"_"substr(a[8],1,4);print dt","$0}' Bounce*

makan · September 4, 2012, 6:04am

all,

thank you for all your assistants.
my case is solved now