Use Awk and Array to get total size of files

Hello all,

I need to do scripts total up the size in selected extension file for example motion.mov and segmentation.avi is in Label Media. For file info.doc and calc.xls in Label Document.

I need output will be like this:

count 1
Media,[1],2 GB
count 2
Document,[2],4 GB

My problem is, when run the scripts seems like total all together, look like below:

count 1
Media,[1],6 GB
count 2
Document,[2],6 GB

This is scripts i'm wrote:

T[1]="\.avi\$|\.mov\$"
LABEL[1]=Media

T[2]="\.doc\$|\.xls\$"
LABEL[2]=Document

for count in 1 2;
do
     echo count $count
     echo ${LABEL[$count]},`egrep "$T[$count]" test.txt | awk -v l="$LABEL[$count]" '{ SUM += $5} END { SUM=SUM/1073741824 ; print l","SUM" GB " }'`

Refering to test.txt is:

-rwxr--r-- 1 emage users 1073741824 Jun 12  2007 motion.mov
-rwxr--r-- 1 emage users 1073741824 Jun 12  2007 segmentation.avi
-rwxr--r-- 1 emage users 2147483648 Jun 12  2007 info.doc
-rwxr--r-- 1 emage users 2147483648 Jun 12  2007 calc.xls

Can anyone help me on this?

Try this,

#!/bin/sh

T[1]='"\.avi\$|\.mov\$"'
LABEL[1]=Media

T[2]='"\.doc\$|\.xls\$"'
LABEL[2]=Document

for count in 1 2;
do
     echo count $count
     echo ${LABEL[$count]},`eval egrep ${T[$count]} test.txt |awk -v l="$LABEL[$count]" '{ SUM += $5} END { SUM=SUM/1073741824 ; print l","SUM" GB " }'`
done
1 Like

Hi Pravin,

It work now. Thank you very much.

[/CODE]
I forget something how can to put together in this scripts space available in selected drive for example /mnt/data/ and output in *.csv

Example for output:

Type ; Size
Media ; 123
Document ; 123
Others ; 123
Available space ; 123      <-- For /mnt/data

From this data I can do Pie Chart in excel.

Below is updated scripts:

#!/bin/sh

ls -lR | egrep ^- > test.txt

Type[1]='"\.avi\$|\.mov\$"'
LABEL[1]=Media

Type[2]='"\.doc\$|\.xls\$"'
LABEL[2]=Document

for count in 1 2;
do
  echo count $count
  echo ${LABEL[$count]},`eval egrep ${Type[$count]} test.txt |awk -v l="$LABEL[$count]" '{ SUM += $5} END { SUM=SUM/1073741824 ; print l","SUM" GB " }'`
done

Note: This scripts not include with space available.

#!/bin/sh
 
ls -lR | egrep ^- > test.txt
 
Type[1]='"\.avi\$|\.mov\$"'
LABEL[1]=Media
 
Type[2]='"\.doc\$|\.xls\$"'
LABEL[2]=Document
 
for count in 1 2;
do
echo count $count
echo ${LABEL[$count]},`eval egrep ${Type[$count]} test.txt |awk -v l="$LABEL[$count]" '{ SUM += $5} END { SUM=SUM/1073741824 ; print l","SUM" GB " }'`
done
df -k /mnt/data | awk 'NR==2{print "Available space ; " $4 / 1048576}'

You could also do a pie chart with gnuplot straight from your unix box :wink:

1 Like

Thanks Chubler_XL and not forget also to Pravin. Other thing is output as *csv. Correct me if I'm wrong. It's okay if use this way for output? or do you have other way to do this?

Sample:

./myScripts.sh > $`date`.csv

So the output will be like this?

Type ; Size
Media ; 123
Document ; 123
Others ; 123
Available space ; 123 <-- For /mnt/data

From here I can convert to chart in excel.

How about this?

./myScripts.sh >  $(date '+%d%m%Y%H%M%S').csv
1 Like

Thanks Parvin.

It okay, but the output like this:

count 1
Media,[1],0 GB
count 2
Document,[2],1.99303e-07 GB
Available space ; 0

Do you have other way like this

count 1              <-- need to remove
Media,[1],0 GB    <-- remove [1]
count 2              <-- need to remove
Document,[2],1.99303e-07 GB <--remove [2]
Available space ; 0

For example:

Media,0 GB
Document,1.99303e-07 GB
Others,0 GB
Available space ; 0

and one more is available space:

[root@CentOS user]# df -k /home/user
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      17093992   5337976  10887664  33% /
[root@CentOS user]#

[root@CentOS user]# df -h /home/user
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       17G  5.1G   11G  33% /
[root@CentOS user]#

After run scripts the output show 0 but from df -h and df -k still available

Sorry I'm totally lost!

---------- Post updated 03-19-11 at 04:08 AM ---------- Previous update was 03-18-11 at 05:44 AM ----------

Hi all,

This is updated scripts but still need feedback from expect.

Run scripts ./myScripts.sh > list.txt

This is ./myScripts.sh:

#!/bin/sh
ls -lR | egrep ^- > test.txt
Type[1]='"\.avi\$|\.mov\$|\.mp3\$"'
LABEL[1]=Media
 
Type[2]='"\.doc\$|\.xls\$"'
LABEL[2]=Document
 
Type[3]='"\.png\$"'
LABEL[3]=Image
 
for count in 1 2 3;  
do
  echo count $count
  echo ${LABEL[$count]},`eval egrep ${Type[$count]} test.txt |awk -v l="$LABEL[$count]" '{ SUM += $5} END { SUM=SUM/1073741824 ; print l","SUM" GB"}'`
done
 
# Available space
df -P /home/sheikh | awk 'NR==2{print "Available,0,",$4/1048576 " GB"}'
 
# For remove count [num]
sed -i~ -e 's/count.*$//' list.txt
 
# For remove column no. 2
cat list.txt | awk 'BEGIN{FS=",";OFS=","}{$2=$6="";gsub(FS "+",FS)}1' > output.txt

list.txt

count 1
Media,[1],0 GB
count 2
Document,[2],0.000241002 GB
count 3
Image,[3],0.000567848 GB
Available,0, 10.3802                  <--  Put dummie 0 for delete column no 2    

Output:

[root@CentOS user]# cat output.txt
,                                             <-- Problem no.1
Media,0.0132836 GB,
,                                             <-- Problem no.2  
Document,0.000241002 GB,
,                                             <-- Problem no.3
Image,0.000567848 GB,
Available, 10.3802 GB,          <-- Problem no. 4
[root@CentOS sheikh]#

Problem no.1 to problem no.3 is space row. I need to do without space. Related to this scripts:

# For remove count [num]
sed -i~ -e 's/count.*$//' list.txt

Reason - Want to remove "count 1 until count 3" It's is the right scripts?

Wanted output:

Media,0.0132836 GB,
Document,0.000241002 GB,
Image,0.000567848 GB,
Available, 10.3802 GB,
 
Problem no.4, put 0 at column no.2
Available,0, 10.3802                  <--  Put dummie 0 for delete column no 2

If without value in column no 2, then figure 10.3802 will be deleted. Any command without to put 0.

Please help me..

If I have understood correctly,Try this

#!/bin/sh

ls -lR | egrep ^- > test.txt
Type[1]='"\.avi\$|\.mov\$|\.mp3\$"'
LABEL[1]=Media

Type[2]='"\.doc\$|\.xls\$"'
LABEL[2]=Document

Type[3]='"\.png\$"'
LABEL[3]=Image

for count in 1 2 3;
do
  echo ${LABEL[$count]},`eval egrep ${Type[$count]} test.txt |awk '{ SUM += $5} END { SUM=SUM/1073741824 ; print SUM" GB"}'`
done

# Available space
df -P /home/sheikh | awk 'NR==2{print "Available,0,",$4/1048576 " GB"}'
1 Like

Thanks Pravin,

I would like to use email to send out my data and picture of graph. Before that if the data as perbelow named "profile.csv". This data "profile.csv" is for generate graph using gnuplot.

profile.csv

 
Media 894.45
Document 106.35
Picture 75.52
Archive 72.70
Database 0.06

How can reprint again follow this format in email.txt and add "-", "GB" and "Data Profiler" with underline.

 
Data Profiler
----------------
Media        - 894.45 GB
Document - 106.35 GB
Picture      - 75.52 GB
Archive      - 72.70 GB
Database  - 0.06 GB

Sample in email.txt

Dear Administrator,

Attached is Picture of Graph and Data.

 
Data Profiler
----------------
Media        - 894.45 GB
Document - 106.35 GB
Picture      - 75.52 GB
Archive      - 72.70 GB
Database  - 0.06 GB

Thanks.

From system.

run to send email:

 
mutt -s Automation of Graph -a picture1 picture2 picture3 picture4 user@mail.com < email.txt

Thanks.

Sheikh

I recommended gunplot in an earlier post, but gnuplot dosn't have any support for pie charts. You may find R an easier way to go. You will need to install the CRAN plotrix module.

The you can generate your graph from profile.csv like this:

#!/bin/bash
Val=
Name=
 
while read name val
do
   Name="${Name},\"$name\""
   Val="${Val},$val"
done < profile.csv
 
Name=$(echo $Name | sed 's/^,//')
Val=$(echo $Val | sed 's/^,//')
 
echo "# 3D Exploded Pie Chart
library(plotrix)
jpeg(\"disk.jpg\",height=600,width=800)
slices <- c($Val)
lbls <- c($Name)
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,\"%  \",sep=\"\") # ad % to labels
pie3D(labelcex=0.9,slices,labels=lbls,explode=0.1,
main=\"Storage Profiler \")" > disk${$}.r
 
/usr/local/bin/rscript.exe disk${$}.r
rm disk${$}.r

Result is

1 Like

Thanks Chubler_XL,

Before that may I know how about csv structure? It is like this?

 
Media 894.45
Document 106.35
Picture 75.52
Archive 72.70
Database 0.06

and what I need to put inside this?

 
#!/bin/bash
Val= ?
Name= ?
 

or your can attach your sample of profile.csv here. Please advice.

Thanks.

Sheikh

Yes, my profile.csv was just like the one in your post:

profile.csv

Media 894.45
Document 106.35
Picture 75.52
Archive 72.70
Database 0.06
1 Like

Thank you very much Chubler XL. It work!

How about my csv file like this?:

 
Media,894.45
Document,106.35
Picture,75.52
Archive,72.70
Database,0.06

Between name and value have comma. What I need to change in this scripts?

Thanks.

Just add bits in green as shown:

OIFS="$IFS"
IFS=','
while read name val
do
   Name="${Name},\"$name\""
   Val="${Val},$val"
done < profile.csv
IFS="$OIFS"

Okay, later I will try. But one more thing, this data I need to send to email via command line (include to scritps) Scripts for email it work and no problem.

Previous my csv file like this:

 
Media 894.45
Document 106.35
Picture 75.52
Archive 72.70
Database 0.06

Then from here I can add in to email scripts:

 
awk '{ print $1 "  -  " $2,"GB" }' profile.csv >> email.txt

output is -> Media - 894.45 GB <- add (-) and (GB)

That why my csv file don't have comma from beginning.

Now if my data with comma:

 
Media,894.45
Document,106.35
Picture,75.52
Archive,72.70
Database,0.06

How do I remove the comma then I can use back my email scripts:

 
awk '{ print $1 "  -  " $2,"GB" }' profile.csv >> email.txt

output is - Media - 894.45 GB <- add (-) and (GB)

Thanks.

Add field seperater in awk as below

awk -F, '{ print $1 "  -  " $2,"GB" }' inputfile
1 Like

Thanks Pravin... Million thanks... also not forget to Chubler XL...