Ommit the numbers or any characters only at 8th columns after the dot (.).

Ommit the numbers or any characters only at 7th or 8th columns after the dot (.) . Since the group column has 1 and 2 spaces.
Thanks

-rw-r--r--. 1 user1   domain users           619 2017-04-13 16:16:50.284598383 +0000  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33.182976407  +0000 aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35.821185258  +0000 aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36.310976557  +0000 ab

Output:

-rw-r--r--. 1 user1 domain users           619 2017-04-13 16:16:50  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33  aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35  aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36  ab

Hi.

Using perl , and taking advantage of the unique context of each date-related chunk:

#!/usr/bin/env bash

# @(#) s1       Demonstrate string manipulation, based on context.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C perl

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat $E

pl " Results:"
perl -wpe 's/(:\d\d)([.]\d+\s\s?[+]\d+)/$1/' $FILE |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
perl 5.20.2

-----
 Input data file data1:
-rw-r--r--. 1 user1   domain users           619 2017-04-13 16:16:50.284598383 +0000  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33.182976407  +0000 aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35.821185258  +0000 aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36.310976557  +0000 ab

-----
 Expected output:
-rw-r--r--. 1 user1 domain users           619 2017-04-13 16:16:50  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33  aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35  aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36  ab

-----
 Results:
-rw-r--r--. 1 user1   domain users           619 2017-04-13 16:16:50  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33 aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35 aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36 ab

-----
 Verify results if possible:

-----
 Comparison of 4 created lines with 4 lines of desired results:
f1 expected-output.txt differ: char 21, line 1
 Failed -- files f1 and expected-output.txt not identical -- detailed comparison follows.
 Succeeded by ignoring whitespace differences.

The decreased space in the expected output appears to be a typo, so no attempt was made to correct it.

Best wishes ... cheers, drl

1 Like

Thank you so much.. i appreciated it.. 1thanks for you... but somehow im hoping for awk or sed..i just need a 1 line command..

Hi.

To get the best advice on transforming, extracting, manipulating data,
please supply the environment and context:

1. Os and shell, preferably with versions
2. Representative input
3. Output desired that corresponds to the input
4. Logic to obtain output from input
5. Attempts at a solution that you have tried

If you need to use or to avoid certain tools, you may need
to explain why, especially if your post count is low. Once the
problem is identified, responders often can choose the most
appropriate tool, not necessarily the one you might want.

These guidelines allow responders to create solutions without
ambiguity and to avoid creating sample data sets.

Note that:

perl -wpe 's/(:\d\d)([.]\d+\s\s?[+]\d+)/$1/' $FILE

is a single line. The rest is identifying, documenting, and reporting code.

Best wishes ... cheers, drl

1 Like

how about - could probably be improved:

sed 's/\(.*\)\(\..*\+.*\)  *\(.*\)$/\1 \3/g' myFile
1 Like

Hi Drl,
Can you please explain to me character per character this command please?

perl -wpe 's/(:\d\d)([.]\d+\s\s?[+]\d+)/$1/'

Thanks

---------- Post updated at 10:00 PM ---------- Previous update was at 09:53 PM ----------

Sir,
Can you please explain this letter per letter word per word. So next time I will do it on my own?

ls --full-time -rt |sed 's/\(.*\)\(\..*\+.\) *\(.\)$/\1 \3/g'

THanks

---------- Post updated at 10:00 PM ---------- Previous update was at 10:00 PM ----------

Sir,
Can you please explain this letter per letter word per word. So next time I will do it on my own?

ls --full-time -rt |sed 's/\(.*\)\(\..*\+.\) *\(.\)$/\1 \3/g'

THanks

Hi.

\d   a numeric character
\s   a space (actually any whitespace)
[p]  a literal "p", in this case period, fullstop (usually "." is a meta-character matching anything), [+] matches a plus sign, 
      (usually + is one or more of the previous)
?    a quantifier, either 0 or 1 of the previous item
()   capturing parens, to variable $1

See perldoc perlre for more details ... cheers, drl

NAME
    perlre - Perl regular expressions

DESCRIPTION
    This page describes the syntax of regular expressions in Perl.

    If you haven't used regular expressions before, a quick-start introduction
    is available in perlrequick, and a longer tutorial introduction is
    available in perlretut.

...

If you cannot find a way of telling ls to filter the output a bit different, then you may try the following:

perl -pe 's/\.\d{9}\W+\d{4}//' example.output

Output:

-rw-r--r--. 1 user1   domain users           619 2017-04-13 16:16:50  aa
drwxr-xr-x. 2 root    root             6 2017-05-08 12:40:33 aaa
-rw-r--r--. 1 root    root         13883 2017-03-31 17:07:35 aa.sh
-rw-r--r--. 1 root    root             0 2017-05-08 12:40:36 ab
s/regex// # substitute regex for empty (delete)
\. # match the period
\d{9} # match nine digits (284598383)
\W+ # match white spaces and plus symbol
\d{4} # match four more digits (0000)

Of course, but we would really appreciate it if you could obey the forum rules and post code (any code, data and output) in CODE-tags or - if they appear in running text, like commands - in ICODE-tags. For instance write the command ls --full-time -rt |sed 's/\(.*\)\(\..*\+.*\) *\(.*\)$/\1 \3/g' like this.

Back to your question:

sed 's/\(.*\)\(\..*\+.*\)  *\(.*\)$/\1 \3/g'

First, the basic command:

sed 's/<something>/\1 \3/g

We replace something (in fact every instance of something, because of the "g" at the end) by \1 \3 . \1 and \3 are so-called "back-references". They work like variables: you search for something in the search part (the "<something>") and whatever you have found is put into the variable. The "first" and the "third" such found things will be put into the result, effectively deleting the second.

Now, lets have a look at the "something" which the input line is broken up into:

\(.*\)\(\..*\+.*\)  *\(.*\)$

Whatever is between "\(" and "\)" is put into such a backreference, hence we see three such pairs (marked bold) and a few characters in between:

\(.*\)
\(\..*\+.*\)
  *
\(.*\)
$

Let us first deal with the things outside the bracket pairs: * is a space, followed zero or more spaces. The asterisk means "zero or more of the character (in fact "regex", but in this case the regex is only a single character) before", hence "one or more of this character" is expressed by first such a character, then the same character with the asterisk:

x*       # zero or more x'es, hence even no x at all
xx*      # one or more x'es, hence at least one x

The $ means "end of line" and is a way of "anchoring" a regular expression. If you search for a group of characters they could appear anywhere in a line. If you want to specifically search for a word appearing at the beginning or the end of a line these anchors (there is ^ for "beginning of line" and $ for "end of line") are the means to express that.

To sum up so far, the search expression means:

\(??\)\(??\)<one or more spaces>\(??\)<end-of-line>

For the "??" parts:

\(.*\)\(\..*\+.*\)  *\(.*\)$

The dot ( . ) means "any character", therefore, in conjunction with the asterisk, which means "any number of what precedes me", "any number of any character" - the first bracket pair pretty much mathces everything in any length.

If this would be the whole regex it would match the complete line. But because it isn't the second brackets pair is in fact limiting it:

\(\..*\+.*\)

This matches a literal dot character (because the dot has a special meaning to sed if you want to match only a real literal dot you need to "escape" it - precede it with a backslash: "." = "any character "\." = "a literal dot character". Analogous for "\+" (escaped "+" character), hence: the meaning of the regexp inside the bracket pair is: a literal dot, followed by anything, followed by a literal "+", followed by anything.

You should now be able to decipher the rest and put together what it means in context. One thing you need to know, though: regexps are always "greedy" meaning that if there are several ways to match something always the longest possible match
is used. For example, here is some input and a regexp. The matched part is marked bold:

aBxyzBbla-foo-BsomethingBandsomemore
a.*B

Notice that "aB" would also have been a valid match for a.*B , but the longest possible is the one i marked. Therefore will the first regexp part i.e. skip over the first literal dot (after the filemode field: drwxrwxrwx. ) and only go for the second one.

@drl: I think you could forego the "g" at the end, because you anchor the regexp at the end-of-line anyway.

I hope this helps.

bakunin

2 Likes