Hi---Is there's way can write small shell script or perl script open "abc.txt" file and create new "new_abc.txt" file with format output below? Thanks
cat abc.txt
###########################Readme###############################
Contained with this README.TXT file are all of the
file specs for your
directory abt.
###########################Readme###############################
Filename : SW_PP_CTRL_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 2019064
Unzipped Bytes : 11413730
Records : 95788
Record Length : 157
Filename : SW_PP_DATA_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 691778058
Unzipped Bytes : 8316153069
Records : 60400481
Record Length : 158
Filename : SW_PP_DEMO_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 26240709
Unzipped Bytes : 77053000
Records : 543250
Record Length : 227
Filename : SW_PP_PLANXREF_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 557904
Unzipped Bytes : 3061930
Records : 16262
Record Length : 310
Filename : SW_PP_PRODUCT_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 21375
Unzipped Bytes : 229431
Records : 1264
Record Length : 211
Filename : SW_PP_REASSIGN_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 32681
Unzipped Bytes : 69399
Records : 802
Record Length : 130
Output: cat new_abc.txt
FILE_NAME,Filename|Data Format|Compression|GZIP Bytes|Unzipped Bytes|Records|Record Length
SW_PP_CTRL_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|2019064|11413730|95788|157
SW_PP_DATA_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|691778058|8316153069|60400481|158
SW_PP_DEMO_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|26240709|77053000|543250|227
SW_PP_PLANXREF_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|557904|3061930|16262|310
SW_PP_PRODUCT_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|21375|229431|1264|211
SW_PP_REASSIGN_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|32681|69399|802|130
---------- Post updated at 08:39 PM ---------- Previous update was at 05:33 PM ----------
I have this perl code and when run see extra "|" at the end of each line. And not sure how to code remove it or can write shell script easier.....can someone help? Thanks
#!/usr/bin/perl
my $filename = 'abc.txt';
open(my $fh, '<:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!";
print "FILE_NAME,Filename|Data Format|Compression|GZIP Bytes|Unzipped Bytes|Records|Record Length";
while (my $row = <$fh>) {
chomp $row;
my ($label, $value) = split /: /, $row;
if ($row eq '') {
print "\n";
}
else
{
print "$value|";
}
}
/test1> ./test.pl
FILE_NAME,Filename|Data Format|Compression|GZIP Bytes|Unzipped Bytes|Records|Record Length|||||
SW_PP_CTRL_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|2019064|11413730|95788|157|
SW_PP_DATA_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|691778058|8316153069|60400481|158|
SW_PP_DEMO_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|26240709|77053000|543250|227|
SW_PP_PLANXREF_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|557904|3061930|16262|310|
SW_PP_PRODUCT_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|21375|229431|1264|211|
SW_PP_REASSIGN_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|32681|69399|802|130|
Except for the strange field 1 header in your output, the following awk script seems to produce the output you requested:
awk -F ' : ' '
NF == 2 {
if(h < 7)
printf("%s%s", $1, (++h == 7) ? ORS : OFS)
o = f++ ? (o OFS $2) : $2
if(f == 7) {
print o
o = ""
f = 0
}
}' OFS='|' file
which prints:
Filename|Data Format|Compression|GZIP Bytes|Unzipped Bytes|Records|Record Length
SW_PP_CTRL_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|2019064|11413730|95788|157
SW_PP_DATA_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|691778058|8316153069|60400481|158
SW_PP_DEMO_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|26240709|77053000|543250|227
SW_PP_PLANXREF_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|557904|3061930|16262|310
SW_PP_PRODUCT_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|21375|229431|1264|211
SW_PP_REASSIGN_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|32681|69399|802|130
Thank you for your help Mr.Don. I run the code and output 1 line header should no space and require extra 'FILE_NAME,' like below. Could you please help out again?
Filename | Data Format | Compression | GZIP Bytes | Unzipped Bytes | Records | Record Length
SW_PP_CTRL_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|2019064|11413730|95788|157
SW_PP_DATA_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|691778058|8316153069|60400481|158
SW_PP_DEMO_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|26240709|77053000|543250|227
SW_PP_PLANXREF_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|557904|3061930|16262|310
SW_PP_PRODUCT_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|21375|229431|1264|211
SW_PP_REASSIGN_20170505.txt.gz|ASCII with carriage returns and linefeeds|GZIP|32681|69399|802|130
If you copied the code I gave you and you executed that code as given, the output you showed us is not the output that would have been produced unless your sample input file format is significantly different from the input file you used when you ran my script.
With a very simple script like the one I suggested, it should be easy for you to modify it to print a constant header line that doesn't try to use the field headings found in the data being read. Why don't you try modifying the code I suggested and let us know where you run into problems if you can't make it work?
Thank you mr.Don. Yes...the input had space.....anyway. I got it work.....thank you for your input.....
Filename : SW_PP_PRODUCT_20170505.txt.gz
Data Format : ASCII with carriage returns and linefeeds
Compression : GZIP
GZIP Bytes : 21375
Unzipped Bytes : 229431
Records : 1264
Record Length : 211
(1) Your input file has a "readme" block at the start. You process it as if it were a normal line, which results in the vertical bars added to your header line.
(2) Instead of
if ($row = '')
, I would reverse the test and ask, whether a line contains a colon - and only then split it:
if ($row =~ /:/)
# .... split
else
# .... process other lines
end
Actually, you can even get rid of the split by
if ($row =~ /:(.*)/)
# ..... The part after the colon is now stored in $1
else
# .... process non-colon lines
end
(3) Note that, if you find a value, you always print it as "$value|". This means that every line has a vertical bar at the end.
(4) If your input has a sequence of more than one lines without a colon, you would also produce the same number of empty lines in the output.