Grep/print/ a test file

dotran · May 18, 2015, 4:49pm

cat abc.txt

Filename: SHA_AED_Monthly_SNR_20150331.txt.gz
Data Format: ASCII with carriage returns and linefeeds
Compression: GZIP
GZIP Bytes: 36893068
Unzipped Bytes : 613794510
Records: 851310
Record Length: 738
Blocksize: 32472


Filename: SHA_AED_SNR_ChangeLog_20150331.txt.gz
Data Format: ASCII with carriage returns and linefeeds
Compression: GZIP
GZIP Bytes: 148288
Unzipped Bytes : 740507
Records: 3877
Record Length: 189
Blocksize: 32697


Filename: SHA_AED_SNR_OutletMaster_20150331.txt.gz
Data Format: ASCII with carriage returns and linefeeds
Compression: GZIP
GZIP Bytes: 8147188
Unzipped Bytes : 31502244
Records: 164837
Record Length: 199
Blocksize: 32636

I like get new output file (newfile.txt) to print (Filename|Records|Unzipped Bytes) only

 
 SHA_AED_Monthly_SNR_20150331.txt.gz|851310|36893068
 SHA_AED_SNR_ChangeLog_20150331.txt.gz|3877|740507
 SHA_AED_SNR_OutletMaster_20150331.txt.gz|164837|31502244

Could someone please help me with this script below why not work. Thanks

 
 #!/bin/ksh
  
 ls -1 abc.txt |while read FILE
 do
 Filename=`cat abc.txt |grep Filename |awk '{print $3}'`
 Record=`cat abc.txt |grep Records |awk '{print $3}'`
 Gunzip=`cat abc.txt |grep Unzipped |awk '{print $4}'`
 echo "$Filename|$Record|$Gunzip" >>  newfile.txt
 done

Skrynesaver · May 18, 2015, 5:06pm

if your records are reliably uniform you could echo only if a test for a blank line.

WARNING UNTESTED LATE NIGHT CODE...

egrep '^$' && echo "$Filename|$Record|$Gunzip" >>  newfile.txt

dotran · May 18, 2015, 5:20pm

Thanks Skynesaver......so should be like this?

 
 #!/bin/ksh
ls -1 abc.txt |while read FILE
do
Filename=`cat abc.txt |grep Filename |awk '{print $3}'`
Record=`cat abc.txt |grep Records |awk '{print $3}'`
Gunzip=`cat abc.txt |grep Unzipped |awk '{print $4}'`
#echo "$Filename|$Record|$Gunzip" >>  newfile.txt
egrep '^$' && echo "$Filename|$Record|$Gunzip" >>  newfile.txt
done

Skrynesaver · May 18, 2015, 5:51pm

Actually, you should probably step through the file in a loop rather than your current approach...(Late night code warning remains in force but this is closer to working than the approach above )

#!/usr/bin/perl

open (my $data, '<', $ARGV[0]);
while (<$data>){
$record{$1}=$2 if (/(\S+)\s*:\s*(.+)$/);
}
if ((/^\s*$/) && ($record{Filesname} ne '')){
print join('|',@record{"Filename","Record","Unzipped Bytes"}),"\n";
$record{Filename}='';
}

vgersh99 · May 18, 2015, 6:07pm

awk -f dot.awk abc.txt where dot.awk is:

BEGIN {
  RS=""
  FS=":"
  OFS="|"
  split("Filename|Records|Unzipped Bytes ", t, "|")
  for(i=1; i in t;i++)
    namesA[t]=i
}
{
  for(i=1; i<=NF;i=i+2)
    if ($i in namesA)
     printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
  print ""
}

dotran · May 19, 2015, 12:02am

Thanks vgersh99. Is this complete code? Somehow I can't make it work....please help. Thanks

 
 #!/bin/ksh
 cat abc.txt | BEGIN {
  RS=""
  FS=":"
  OFS="|"
  split("Filename|Records|Unzipped Bytes ", t, "|")
  for(i=1; i in t;i++)
    namesA[t]=i
}
{
  for(i=1; i<=NF;i=i+2)
    if ($i in namesA)
     printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
  print ""
} >>  newfile.txt

Don_Cragun · May 19, 2015, 1:01am

Hi dotran,
Note that vgersh99 suggested using the awk utility (not the non-existent BEGIN utility).
And, awk is perfectly capable of reading files without creating a pipeline using cat to double the number of processes running and triple the number of bytes read and written to read your input file.

Please look more closely at what vgersh99 suggested and try what he suggested.

RudiC · May 19, 2015, 1:35am

vgersh99's proposal does work on my (Free)BSD system, but not quite on my linux mawk , as the line feeds are not used as field separators there. Try using FS="[:\n]" instead.

dotran · May 19, 2015, 1:56am

Thanks Mr. Don. Cause I really don't understand how what vgersh99 suggested. Could you help me correct this code?

#!/bin/ksh
awk abc.txt | BEGIN {
RS=""
FS=":"
OFS="|"
split("Filename|Records|Unzipped Bytes ", t, "|")
for(i=1; i in t;i++)
namesA[t]=i
}
{
for(i=1; i<=NF;i=i+2)
if ($i in namesA)
printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
print ""
} >> newfile.txt

RudiC · May 19, 2015, 1:59am

Run it like

awk '
BEGIN   {RS=""
         FS="[:\n]"
         OFS="|"
         for (i=split("Filename|Records|Unzipped Bytes ", t, "|"); i; i--) namesA[t]=i
        }
        {for(i=1; i<=NF;i=i+2)
         if ($i in namesA)
         printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
         print ""
        }
' abc.txt

Don_Cragun · May 19, 2015, 2:06am

And to append the output from the awk script to newfile.txt , change the last line RudiC suggested to:

' abc.txt >> newfile.txt

Note that I said append not replace; if you want to replace the contents of newfile.txt , use:

' abc.txt > newfile.txt

Either of the above will create newfile.txt if it did not already exist.

dotran · May 19, 2015, 2:11am

Thanks Mr RudiC & Don. I still get syntax somehow
/staging/dotran :cat abctest.ksh

#!/bin/ksh
awk '
BEGIN   {RS=""
         FS="[:\n]"
         OFS="|"
         for (i=split("Filename|Records|Unzipped Bytes ", t, "|"); i; i--) namesA[t]=i
        }
        {for(i=1; i<=NF;i=i+2)
         if ($i in namesA)
         printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
         print ""
        }
' abc.txt

/staging/dotran :./abctest.ksh
awk: syntax error near line 8
awk: illegal statement near line 8
awk: syntax error near line 9
awk: illegal statement near line 9

RudiC · May 19, 2015, 2:25am

Hey Don, where's your usual line?

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk . (Don Cragun)

dotran · May 19, 2015, 2:32am

uname -a

SunOS scorpion 5.10 Generic_147147-26 sun4v sparc sun4v

I changed to nawk and get different error.

/staging/dotran :cat abctest.ksh

#!/bin/ksh
nawk '
BEGIN   {RS=""
         FS="[:\n]"
         OFS="|"
         for (i=split("Filename|Records|Unzipped Bytes ", t, "|"); i; i--) namesA[t]=i
        }
        {for(i=1; i<=NF;i=i+2)
         if ($i in namesA)
         printf("%s", (namesA[$i]==1)?$(i+1):OFS $(i+1))
         print ""
        }
' abc.txt

/staging/dotran :./abctest.ksh
nawk: newline in character class [:
]...
input record number 1, file abc.txt
source line number 7

Don_Cragun · May 19, 2015, 2:38am

I got this thread confused with a different thread where the OS was specified to be a Linux distribution. This thread doesn't specify the OS, so I should have included the warning line.

Don

Don_Cragun · May 19, 2015, 3:47am

Try /usr/xpg4/bin/awk instead of nawk .

Aia · May 19, 2015, 4:05pm

/usr/xpg4/bin/awk 'BEGIN{RS=""; OFS="|"} {print $2, $21, $15}' abc.txt