I am able to grep multiple patterns which stored in a files. However, how could we replace the whole line with either the pattern or new string?
For example:
pattern_file: *Info in the () is not part of the pattern file. They are the intended name to replace the whole line after the pattern found. Listed here for reference.
How to replace them with either the new name or pattern name . The reason i want to replace them is that later i need to count how many patterns had been found. Maybe using
sort -u | wc
.
I stuck after grep all the matched, but do not know how many patterns had been found.
OK, first: if you want to change something, grep is not the right tool for it. You should use sed . grep is for finding things - but only finding, not changing them.
Second: before you start on a solution you should define your problem correctly. For instance, your sample input file has seven lines, your expected output has 5. Are the two missing lines left on purpose? If yes, say so. If not, how should they be handled? Maybe let unchanged?
So, let us first rephrase your task. I will make some assumptions here which might as well be wrong. Don't hesitate to correct them:
you have an input file containing certain text patterns and a pattern file which you want to apply to the input. When a pattern is matched you want to replace the whole line in the input with a certain marker, which is defined distinctly for each pattern found that way. Lines not matched by any pattern should be deleted from the result set. In a final step you want to count how many markers of each kind are found in the result set.
krishmaths, thank you very much for the input.
Useful command that combine the grouping and count together. After that I can filter the group not in the pattern_file and achieve the purpose.
But, the grouping seem to be limited to certain format of input. The input file might have format as below, quite random:
bakunin, thank you very much for sorting this out.
My initial thinking is to identify how many patterns can be found for an input file.
Let's say I had 50 lines of patterns and 1000 lines of input. How many patterns are there in these 1000 lines? Maybe 400 lines matched but only 30 patterns. These 400 lines are unique so my idea is to group them and count. That's how I come to grep and replace line work flow.
Focus is not to overwrite the input info. I do not need an output file as well. Everything can do in pipe and get the count is the best.
--- Post updated at 10:30 AM ---
MadeInGermany, thank you very much for this. This suit what I want to do.
For those who got new label to assign, below is my thinking:
while IFS= read pat; do printf "%s match %s times\n" $(grep "$pat" pattern_grp | awk '{print $1}') $(grep -c "$pat" input_file); done < pattern_file
Format of pattern_grp:
H_A hot.*aaa.*
C_B cold.*bbb.*
C_A cold.*aaa.*
Output:
H_A match 2 times
C_B match 1 times
C_A match 2 times
I use grep one more time to count
while IFS= read pat; do printf "%s match %s times\n" $(grep "$pat" pattern_grp | awk '{print $1}') $(grep -c "$pat" input_file); done < pattern_file| grep -c '0 times'
*Not a programmer, very limited knowledge, try to use what I have.
Looks too complicated.
Why 3 input files?
How does you pattern_grp file look like?
Say it looks like
H_A hot.*aaa.*
C_B cold.*bbb.*
C_A cold.*aaa.*
The value pairs seem related.?
Then you can read both whitespace-separated columns into two variables:
while read sp pat; do printf "%s alias %s match %s times\n" "$sp" "$pat" "$(grep -c "$pat" input_file)"; done < pattern_grp
But why do you do all the printing with aliaes when at the end you throw the output away, in favor of the amount of the non-matches?
--
BTW each expression in command arguments should be in "quotes", because the shell should not attempt substitutions on it.
So there should be quotes around the $pat argument of the grep command, and another pair around the $( ) argument of the printf command.
The $( ) runs a subshell, so the quotes inside and outside do not conflict. I forgot the outer quotes in my previous post.