I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18).
I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines:
My intention is to return one line per parameter match across all files.
The first parameter is: '4=[1 to 2000]'
The second parameter is: '3078='
So when grep, awk etc. finds a line that contains both '4=1' and '3078=' it prints the line, and start looking for a line that contains '4=2' and '3078='.
This across all the 500 files (-m 1 does not work in this case as 4=1 and 4=2 might be contained in 1 file and not in the 499 others).
Please also note that '4=[1 to 2000]' and '3078=' are not always at the same position in a line.
Can you please please please help me? I am at loss at what to do
The values in the file are "line" separated: each value has its own line.
Perhaps I do not understand how the pattern file works.
Does it look for '4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.?
Or for all those 4=745 4=746 4=747 etc. on the same line?
How can I write a file (or use the command) that look for the values successively? ('4=745' and '3078=', then for '4=746' and '3078=', then for '4=747' and '3078=' etc.)
Hi clippertm,
Confusing with "The values in the file are "line" separated: each value has its own line." Does the data file not be separated by '|' ? or you are talking about the pattern file ?
As my poor knowledge of shell, consider it does look for a line that contain 4=745 4=746 4=747 etc. then send a matched line to grep "3078"
Of cause you could prepare a patter file like this:
does not work, it stalls (I used 745-755 to simplify and make things faster, I actually run it from 1 to 2000!). It also returns "grep: invalid range" sometimes.
Your first problem is that you need to match the patterns in your pattern file with the second field. You cannot just use grep -f list.txt . You really need to use awk or perl. You can then pattern-match on the two fields you are interested in without a stray 254=47587 in, say, the last field, matching your patterns file.
Something I was not sure about. Are you after the first instance of 4=475 and 3078= AND the first instance of 4=476 and 3078= etc etc rather than the first instance of ANY 4=xxx and 3078= ? In which case you will have to loop through your patterns file anyway.
I am not going to offer any code because I don't really know awk and my perl is rusty. Good luck
Sorry clippertm,
I think the problem of output all occurences is that we used the wildcard *.*
so maybe we must use sort after grep
Or use perl or awk as apmcd47 say, u know, both could solve the problem.
We know the wildcard *.* make the grep consider every first match in every file both are first occurence, may cat *.*|grep could work, but I am not able to test it when I am in a bus.
And sorry for my not very good English, let me check what's your desired output again.
a. line has 4=745 and 3078=
b. line only has 4=475 not 3078=
c. line only has 3078= not 4=475
For your addtional question:
\([1-9]\|[1-9][0-9]\|[1-9][0-9][0-9]\|1[0-9][0-9][0-9]\|2000\)
awk -F "|" '
# hash the search list
NR==FNR {L[$1]=0; next}
# now procede with the data files
# print if the following is true
($4~/^3078=/ && ($2 in L) && L[$2]++==0)
' searchlist.txt datafile1.txt datafile2.txt
Search the 3078= everywhere:
awk -F "|" '
# hash the search list
NR==FNR {L[$1]=0; next}
# now procede with the data files
# print if the following is true
(/|3078=/ && ($2 in L) && L[$2]++==0)
' searchlist.txt datafile1.txt datafile2.txt
---------- Post updated at 06:51 AM ---------- Previous update was at 05:35 AM ----------
grep -m1 exits at every 1st match per file.
awk is much more flexible:
awk -F "|" -v low=745 -v high=755 '
# build the Lookup hash
BEGIN {for (i=low; i<=high; i++) L["4="i]}
# main loop
# if in Lookup hash and if a field begins with 3078=
($2 in L) && /|3078=/ {
print
# delete from the Lookup hash
delete L[$2]
}
' datafile*.txt
The output I am looking for is a. line has 4=745 and 3078=
Thanks again for your help!
---------- Post updated at 09:35 PM ---------- Previous update was at 09:30 PM ----------
Hi MadeInGermany,
Thank you for your awk samples, they do not produce the output I am looking for
If I change the last one to:
awk -F "|" -v low=1 -v high=2000 '
# build the Lookup hash
BEGIN {for (i=low; i<=high; i++) L["4="i]}
# main loop
# if in Lookup hash and if a field begins with 3078=
($2 in L) && /|3078=/ {
print
# delete from the Lookup hash
delete L[$2]
}
' *.txt
It only returns 4 results and there should be 100s.