Extract lines that have entries in VI file

is2_egypt · December 20, 2016, 9:04am

Dears experts

i have UNIX file that contain 4 million lines , i need to extract all lines that have entiries saved in VI file , i have below comand but it takes tooooo long time :

for i in `cat file1.csv`; do cat dump | grep -i  $i >> file2.csv; done

where :
file1.csv = VI file that contain entries to be used for search.
file2.csv = output file.

dump = the name of the dump

example for entry in dump :

IP101MURB-I:1-1-4-13,11-409-0687,capCoupling,2016-12-11 19:01:00,50
IP101MURB-I:1-1-4-13,11-409-0687,degraded,2016-12-11 19:01:00,50
IP101MURB-I:1-1-5-27,11-406-7432,capCoupling,2016-12-11 19:01:00,50

example for entry in VI file :

IP101MURB-I:1-1-4-13

example output file contain :

IP101MURB-I:1-1-4-13,11-409-0687,capCoupling,2016-12-11 19:01:00,50
IP101MURB-I:1-1-4-13,11-409-0687,degraded,2016-12-11 19:01:00,50

note that i need all lines that have this entry.

Thanks in advance

RudiC · December 20, 2016, 9:14am

How about using VI as a pattern file with grep 's -f option:

grep -if file1.csv dump > file2.csv

is2_egypt · December 20, 2016, 9:21am

Hi Rudic

i gt this error :

-bash-3.00$ grep -if dec18PND.csv EP-ports-faults_20161218.csv  >  file2.csv
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .
-bash-3.00$

RudiC · December 20, 2016, 9:25am

It is always helpful to support a request with system info like OS and shell, preferred tools, and adequate sample input and output data to avoid ambiguities and keep people from guessing.

So please provide your OS and grep version.

Scrutinizer · December 20, 2016, 12:02pm

On Solaris use /usr/xpg4/bin/grep rather than grep

is2_egypt · December 20, 2016, 12:12pm

here is the OS

-bash-3.00$ uname -a

SunOS ALU_STC_GW 5.10 Generic sun4u sparc SUNW,Sun-Blade-100
-bash-3.00$

i dont know how to get grep version , however i found this fgrep and when i use it cmmand accepted but produce output file with zero bytes :

fgrep infile2.csv EP-ports-faults_20161218.csv > outfile2.csv

the output file show 0 :

-rw-r--r--   1 motive   other          0 Dec 20 19:06 outfile2.csv
-rw-r--r--   1 motive   other      47763 Dec 20 19:07 outfile.csv

---------- Post updated at 12:12 PM ---------- Previous update was at 12:02 PM ----------

hello Scrutinizer

i did but i copied the whole file as it is :

-rw-r--r--   1 motive   other    223682851 Dec 18 08:09 EP-ports-faults_20161218.csv

-rw-r--r--   1 motive   other    223682851 Dec 20 19:24 outfile3.csv

RudiC · December 20, 2016, 12:36pm

man grep :

Does /usr/xpg4/bin/grep offer the -f option? If yes, create small but representative samples of dump and VI files each, run the command and post the results.

is2_egypt · December 20, 2016, 12:58pm

hi Rudic

whe i did sample with only 3 entries in the VI file it work ok , when i make my bigger sample (165 entry in VI ) it again copy all the dump lines (which is about 4M lines for about 180K entry )

-bash-3.00$ vi infile3.csv
"infile3.csv" [New file] 
LKHYA_AMDR2:1-0-6-22,
MFYSLKHYA_AMDR2:1-0-6-23,
MFYSLKHYA_AMDR2:1-0-7-1,
~
~
~
-bash-3.00$ 
-bash-3.00$ /usr/xpg4/bin/grep -f infile3.csv EP-ports-faults_20161218.csv > outfile3.csv

-rw-r--r--   1 motive   other       4012 Dec 20 20:02 outfile3.csv
-rw-r--r--   1 motive   other       4691 Dec 20 20:09 infile4.csv
-rw-r--r--   1 motive   other    223682851 Dec 20 20:10 outfile4.csv
-bash-3.00$

RudiC · December 20, 2016, 1:26pm

Well, is it possible that all lines are correctly selected, i.e. are represented in the VI file? I recommend to do some intermediate steps, e.g. 20 and then 50 entries in VI.

is2_egypt · December 20, 2016, 1:42pm

hi Rudi

i did and it works fine now with all entries (165) , the first line in VI file was empty , after i remove it it works fine , i wonder if this empty line make the command skip the execution and take full copy of the dump? as it was finish in no time and copy the whole big file in no time.

Aia · December 20, 2016, 1:45pm

Do you have Perl in your system?

copy and paste and save as searcher.pl
Run as

perl searcher.pl EP-ports-faults_20161218.csv

or

perl searcher.pl EP-ports-faults_201612*.csv

if you want to process multiple files.

The filename with search strings is infile2.csv, you can change it in the code
The filename with the result is outfile2.csv, you can change it in the code

#!/usr/bin/perl

use strict;
use warnings;

sub pattern {
    my $f = shift;
    chomp(my @pat = <$f>);
    join "|", @pat;
}

my $patterns_file = "infile2.csv";
open my $fh, '<', $patterns_file or die;
my $search_pattern = pattern($fh);
close $fh;

my $match_searches = "outfile2.csv";
open my $wfh, '>', $match_searches or die;

while(<>) {
    print $wfh $_ if /$search_pattern/;
}
close $wfh;

Scrutinizer · December 20, 2016, 2:25pm

An empty line when using -f means that one of the patterns that it tries is the empty string and that is always a positive hit, i.e. it matches everything...

$ printf "%s\n" aa bbbb ccc | grep ''
aa
bbbb
ccc

MadeInGermany · December 21, 2016, 1:05pm

The /usr/xpg4/bin/grep is quite slow with "-f bigfile".
Better use /usr/xpg4/bin/awk or nawk and a hashed array lookup

nawk -F, 'NR==FNR { A[$1]; next } ($1 in A)' file1.csv dump