I am facing a problem while using the grep command in shell script. Actually I have one file (PCF_STARHUB_20130625_1) which contain below records.
SH_5.55916.00.00.100029_20130601_0001_NUC.csv.gz|438|3556691115
SH_5.55916.00.00.100029_20130601_0001_Summary.csv.gz|275|3919504621
SH_5.55916.00.00.100029_20130601_0001_UI.csv.gz|226|593316831
SH_5.55916.00.00.100029_20130601_0001_US.csv.gz|349|1700116234
SH_5.55916.00.00.100038_20130601_0001_NUC.csv.gz|368|3553014997
SH_5.55916.00.00.100038_20130601_0001_Summary.csv.gz|276|2625719449
SH_5.55916.00.00.100038_20130601_0001_UI.csv.gz|226|3825232121
SH_5.55916.00.00.100038_20130601_0001_US.csv.gz|199|2099616349
SH_5.75470.00.00.100015_20130601_0001_NUC.csv.gz|425|1627227450
And I have a pattern which is stored in one variable (INPUT_FILE_T), and want to search the pattern from the file (PCF_STARHUB_20130625_1). For that I have used below command
INPUT_FILE_T="SH?*???????????????US.*"
grep -h ${INPUT_FILE_T} PCF_STARHUB_20130625_1
The output of above command is coming as below
SH_5.55916.00.00.100029_20130601_0001_US.csv.gz|349|1700116234
Problem is that only one entry is showing in output (It should contain two entries) output should come like below
SH_5.55916.00.00.100029_20130601_0001_US.csv.gz|349|1700116234
SH_5.55916.00.00.100038_20130601_0001_US.csv.gz|199|2099616349
Is there any technique except grep please tell me.
Please help me on this issue.
The grep utility evaluates basic regular expressions. Unfortunately, ( SH?*???????????????US.*
) is a filename matching pattern; not a BRE.
To search for lines in a file that match a pattern matching expression, try the following shell script using any shell that recognizes basic Bourne shell syntax (such as ksh and bash):
INPUT_FILE_T="SH?*???????????????US.*"
while IFS='' read -r f
do case "$f" in
($INPUT_FILE_T) printf "%s\n" "$f";;
esac
done < PCF_STARHUB_20130625_1
Furthermore, since you didn't quote the expansion of $INPUT_FILE_T
in your grep command, the shell expanded that variable into a list of matching filenames in the current directory before calling grep; so (assuming that the file PCF_STARHUB_20130625_1 contained a list of some of the files in the current directory) the command that you ran was expanded by the shell to:
grep -h SH_5.55916.00.00.100029_20130601_0001_US.csv.gz|349|1700116234 SH_5.55916.00.00.100038_20130601_0001_US.csv.gz|199|2099616349 PCF_STARHUB_20130625_1
which treated SH_5.55916.00.00.100029_20130601_0001_US.csv.gz|349|1700116234
as a basic regular expression that happens to match itself when looking in the file PCF_STARHUB_20130625_1
and, fortunately, doesn't seem to have matched any lines in the file named SH_5.55916.00.00.100038_20130601_0001_US.csv.gz|199|2099616349
.
To use grep instead of a loop in the shell, you could translate the filename matching pattern SH?*???????????????US.*
to a corresponding BRE ( SH..*...............US[.].*
or more succinctly SH.\{16,\}US[.].*
) and use:
INPUT_FILE_T_BRE="SH.\{16,\}US[.].*"
grep "$INPUT_FILE_T_BRE" PCF_STARHUB_20130625_1
Note that the double quotes in the above grep command are crucial to keep the shell from trying to expand the BRE as a filename matching pattern
It could be as simple as quoting your search string:-
grep -h "${INPUT_FILE_T}" PCF_STARHUB_20130625_1
It's an odd search string though. From the man page I have on RHEL 6.1, I have:-
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times.
{n,m} The preceding item is matched at least n times, but not more than m times.
So that you mean that you are looking for a record that starts (doesn't have to be at the beginning of the line) with an S, then the H is optional and then I get confused.
Are you trying to use the ? as a single character each time?
I would think a better search string would be more like:-
INPUT_FILE_T="^SH....................................US"
to represent Start of line, SH, then any 3 characters, then US. The remainder of the line can be ignored.
Do either of these meet your needs?
Robin
Liverpool/Blackburn
UK