I wrote an awk script to filter "uninteresting" commands from my ~/.bash_history (I know about HISTIGNORE, but I don't want to exclude these commands from my current session's history, I just want to avoid persisting them across sessions).
The history file can contain multi-line entries with embedded newlines, and entries are separated by timestamps. Given an input file like:
#1501304269
git stash
#1501304270
ls
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done
the script filters out single-line ls, man, and cat commands, producing:
#1501304269
git stash
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done
Notice that multi-line entries are unfiltered -- I figure if they're interesting enough to warrant multiple lines, they're worth remembering.
I've been reading about Sed's multiline capabilities and I'm curious how its hold-space and pattern-space might be manipulated to acheive the same filtering as my Awk script. Rather than use Gnu-sed's -z flag to treat the whole file as a single massive pattern space, I'm looking for a solution that uses commands such as h,H,x,G,N,etc. to accumulate lines in the hold space and swap/delete lines as necessary.
Hm... peaking ahead one line won't let me distinguish a single-line command (which should be excluded if it contains ls|cat|man) from the beginning of a multiline command (which should be kept even if it contains ls|cat|man).
For example, if the exclusion pattern was "xxx", the following input,
The second record should have passed through unmodified since it has multiple lines, but instead it's head was removed and the rest got tacked onto the previous record.
I was thinking something like, when you reach a timestamp, exchange pattern-space with hold-space (x). Now hold-space is ready to start accumulating the oncoming entry and pattern-space holds whichever entry was previously accumulated. I should be able to perform whatever substitution is necessary on pattern-space now to filter out commands I'm not interested in, since I have the full entry. That gets complicated a bit trying to correctly handle the first and last lines of the file.
My latest failed attempt:
1,/^#[[:digit:]]{10}$/ {
/^#[[:digit:]]{10}$/! {
p
d
}
}
/^#[[:digit:]]{10}$/ {
x
/^$/ d
/\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
p
}
/^#[[:digit:]]{10}$/ !{
H
d
}
$ {
x
/\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
p
}
Apparently mawk doesn't support regex repetitions, and maybe not POSIX character classes either.
I couldn't get the desired results from your sed snippet. Not sure why though.
---------- Post updated at 08:20 PM ---------- Previous update was at 08:12 PM ----------
I finally came up with something that works. It's nasty, and I don't doubt there's a better way, but it was satisfying to at least get something working.
$ {
1 h
1!H
x
/^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
p
}
/^#[[:digit:]]{10}$/ !{
1 h
1!H
d
}
/^#[[:digit:]]{10}$/ {
x
/^$/ d
/^#[[:digit:]]{10}$/ d
/^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
}
I benchmarked it against my original awk script, as well as against the following gsed script:
Indeed the mawk version that gets installed by distributions supports neither. I think the latest version does, but you would need to get the source and compile yourself..
--
Your approach seems to also leave out one line commands that do not contain ls man or cat.
Because d directly jumps to the next cycle, and the input line is not modified in the condition branch, the following code does not need a negated condition.
/^#[[:digit:]]{10}$/ !{
1 h
1!H
d
}
x
/^$/ d
/^#[[:digit:]]{10}$/ d
/^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d