Complex grep command

kekanap · November 18, 2014, 2:48am

Hallo Team,

I need your help and its rather urgent.
I have a file with thousands of lines. Here is a sample below:

Sample1.txt

BW235045560121114833444044@196.35.130.5
BW235106757121114-574455394@196.35.130.5
BW2349514941211141077771352@196.35.130.5
BW2353526471211141758512647@196.35.130.5
BW235328223121114-1191893748@196.35.130.5
BW235101704121114-533816579@196.35.130.5
BW235258139121114-1086573027@196.35.130.5
BW235030487121114-1509234107@196.35.130.5
BW234902738121114-471354714@196.35.130.5
BW235128727121114-1903901078@196.35.130.5
BW2349569391211141739226705@196.35.130.5
BW235412196121114-480323232@196.35.130.5
BW2350437861211141154590213@196.35.130.5

I need to grep for each pattern on the file for example:
I took the first pattern on sample.txt to make an example of what i would like to achieve.

grep "BW235045560121114833444044@196.35.130.5" BW*20141112*.csv >> /home/paxk1/test.csv

I hope this is clear. Let me know if you need more explanation.

Regards and thanking you in advance,

Pax

sea · November 18, 2014, 3:20am

Quite a basic task, isnt it?

(untested)

SRC=Sample1.txt
PARSE=BW*20141112*.csv
OUT=/home/paxk1/test.csv

while read pattern;do
     grep "$pattern" $PARSE >> "$OUT"
done<"$SRC"

Hope this helps

kekanap · November 18, 2014, 3:38am

Sea, i guess i will have to initialize $pattern right?

so should the code be like :

SRC=Sample1.txt
PARSE=BW*20141112*.csv
OUT=/home/paxk1/test.csv
PATTERN=BW

while read pattern;do
     grep "$PATTERN" $PARSE >> "$OUT"
done<"$SRC"

?

RudiC · November 18, 2014, 3:58am

No, pattern is being read from $SRC which points to your sample1.txt . What's the result of running sea's code as is?

derekludwig · November 18, 2014, 4:59am

If one reads your requirement:

Would it be fair to say that you want to split the original files into N new files, one file for each unique line (pattern) of the original? If so, you do realize that each file will contain one or more copies of the same line (pattern). Perhaps:

sort file | uniq -c

might be useful? However, if you have to split the files:

sort -u ${FILE} | while read pattern; do
  grep "^${pattern}$" ${FILE} > "${pattern}.csv"
done

(untested)

which seems clunky since the file is being read N times, one for each pattern. How about:

sort ${FILE} | uniq -c | while read N pattern; do
  while [[ 0 -lt ${N} ]]; do
    echo "${pattern}"
    (( N = N - 1 ))
  done > "${pattern}.csv"
done

(untested)

Canu · November 20, 2014, 3:35pm

Kekanap, Have you soved it?
what i didn't got it, is where are you searching for the paterns ? on another file or on several other files ? grep output will be different !

if you are searching for exact same lines in two different files (as the pattern you get from the full line from file1) it might be much easier to just merge the 2 files and look for duplicated lines. that's 1 liner code.

cat file1.txt file2.txt | sort | uniq -c | grep " 2 "

please let us know if you solved it!

BTW one last comment. I'm not very confident to use grep with patterns that will have "." in it.