Hi,
I am using gawk (--posix) for extracting some information from something like the following lines (in a text file):
sms_snath_hp_C/CORE BUILD PREREQUISITE:
total 1556
drwxrwxrwx 2 sn sn 4096 2008-06-27 08:31 ./
drwxrwxrwx 13 sn sn 4096 2009-07-22 14:48 ../
-rwxrwxrwx 1 sn sn 15348 2007-05-11 08:37 This is a file name with seven spaces.jar*
-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eight spaces.jar*
-rwxrwxrwx 1 sn sn 73687 2007-05-11 08:37 ibmjcefw.jar*
-rwxrwxrwx 1 sn sn 767101 2007-05-11 08:37 ibmjceprovider.jar*
With regular expressions (pattern matching) I am ignoring all the lines except the ones which are NOT directories with long listing format.
So I consider only:
-rwxrwxrwx 1 sn sn 15348 2007-05-11 08:37 This is a file name with seven spaces.jar*
-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eighteen spaces.jar*
-rwxrwxrwx 1 sn sn 73687 2007-05-11 08:37 ibmjcefw.jar*
-rwxrwxrwx 1 sn sn 767101 2007-05-11 08:37 ibmjceprovider.jar*
Question is: How do I get the file names with preserving the white spaces in between?
Note that the file has no embedded FS character, then it is just $8 and the problem is over. If the file name has embedded multiple FS characters, then I just do not want to concatenate $8 FS $9 FS $10 (etc in a loop) but I also want to have the multiplicity of the FS characters preserved.
(something like that "read v1 v2 v3 v4 v5 v6 v7 fileName" would do).
-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eighteen spaces.jar*
in the original had multiple spaces in the name of the file (on the html posting here on the forum those got collapsed into single spaces :))
It is something like:
ThisXisXXaXXXfileXXXXnameXXXXwithXXXeighteen spaces.jar*
---------- Post updated at 11:20 PM ---------- Previous update was at 10:58 PM ----------
Hi ryandegreat25
Your quick and correct answer is appreciated. Yes I could use "cut", "read" etc., but all of these are shell (external/internal) commands.
But could we solve this inside the gawk script itself (I mean without calling other shell commands/scripts) ? I already have a gawk script in place that does some other things too. If it cannot be done, then I will have to do a "surgery" on the script and split it into possibly many scripts with "read" or "cut" piped in between.
You know how to use [size] and [color] tags , now you have to learn to use [code] tags when you post sample data , like that your multiple space will not "collapse into a single space"
All the above commands will work.
Only thing is you will have to quote the echo.
Eg:
echo "$x" | cut -d' ' -f8-
Have you tried:
ls -l +d
and
ls +d
Newer versions accept them.
May be this will also help you:
xx='-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eighteen spaces.jar*'
echo "$xx" | sed 's/^.*[0-9] \(.*\)\*$/\1/'
Output:
This is a file name with eighteen spaces.jar
I have removed the * also for you.
---------- Post updated at 03:34 AM ---------- Previous update was at 03:20 AM ----------
zsh-4.3.10[t]% cat infile
sms_snath_hp_C/CORE BUILD PREREQUISITE:
total 1556
drwxrwxrwx 2 sn sn 4096 2008-06-27 08:31 ./
drwxrwxrwx 13 sn sn 4096 2009-07-22 14:48 ../
-rwxrwxrwx 1 sn sn 15348 2007-05-11 08:37 This is a file name with seven spaces.jar*
-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eight spaces.jar*
-rwxrwxrwx 1 sn sn 73687 2007-05-11 08:37 ibmjcefw.jar*
-rwxrwxrwx 1 sn sn 767101 2007-05-11 08:37 ibmjceprovider.jar*
zsh-4.3.10[t]% gawk --posix 'NR > 2 && !/\/$/ {
sub(/([^ \t]+[ \t]+){7}/,"")
print
}' infile
This is a file name with seven spaces.jar*
This is a file name with eight spaces.jar*
ibmjcefw.jar*
ibmjceprovider.jar*
If you don't want to modify the current record you can save it in a variable and then manipulate the saved record:
gawk --posix 'END {
# print the filenames
while (++i <= c) print fn
}
{
# build an array to hold the filenames
if (NR > 2 && !/\/$/) {
rec = $0; sub(/([^ \t]+[ \t]+){7}/,"", rec)
fn[++c] = rec
}
}' infile
Or making use of the fact that the files' timestamp is a pattern found first before the filenames and its length is fixed, another alternative would be:
This is the best answer. I had also come to the same pattern, but albeit separately for each of the first seven fields. Your answer is even better. I am going to change it a little bit as follows:
for the obvious reason that fileName itself could start with a white space!
Could you suggest me a pattern that would also get rid of the very last (one or zero) characters from these: />*|@= ?
but that did not work and so I am having the above left most seven fields removed first and then followed by another to gsub to remove the (zero or one) of those last charcters.
Thanks.
while(<DATA>){
#print $1,"\n" if /EXL.*(KOSBND_EXC_[^ ]*)/;
my @tmp = split(" ", $_, 8);
print $tmp[7];
}
__DATA__
-rwxrwxrwx 1 sn sn 15348 2007-05-11 08:37 This is a file name with seven spaces.jar*
-rwxrwxrwx 1 sn sn 22395 2007-05-11 08:37 This is a file name with eighteen spaces.jar*
-rwxrwxrwx 1 sn sn 73687 2007-05-11 08:37 ibmjcefw.jar*
-rwxrwxrwx 1 sn sn 767101 2007-05-11 08:37 ibmjceprovider.jar*
Notice that I'm using the re-interval option, because the gensub extension is disabled in compatibility (posix) mode and we still need the re-interval functionality.