Help Me - How to grep in multiple servers

Hi All,

I need help , Regarding the keyword search in multiple servers at a time . we are desiging a search website . we have a multiple servers and each of the server have 3 instances having Unix compressed files.Our requirement was we need to search the particular key word for eg. we need to search Prasad in all the servers .is there any way to search the keyword in all servers and all the instances at a time .

exact requirement we have the servers located in Mumbai , Bangalore , Chennai , Delhi and hyderabad each of the location like Mumbai have 3 instances with file name as voter.gz . i want to search the keyword Prasad in all servers in instances having voter.gz file . we need to display the servers and instances having name Prasad . we have the username and password for the connecting server. and proxy password also

Could any body please help me to reduce the time complexity of searching the files .

With Regards,
Prasad .

Is the keyword and fileset always the same? You could set up a cron-ob and send the results to a central server.

If your're designing a _real_ search website I humbly think that you have to revise your whole strategy because searching that way is completely inefficient and resource wasteful.

After saying that I can suggest a few possibilities working with what you have.

1) If you want to search from a workstation through all the servers through SSH, firstly I suggest you configure a simple PKI infrastructure (private/public key). With that, you avoid typing the remote server password each time. In the client you need to have running a ssh-agent to login in behaf of your own user.

After thar you could use something like this:

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]; do echo $server; ssh $server zcat /path/to/voter.gz | grep -l "keyword"; done

You have to rewrite the above one-liner to your exact needs.

A quick work around for a better solution should be a script that parses that compressed files extracting all keywords and populating a MySQL (or any other) database. You're searches will be a lot faster and consistence because you have all the info in a single repository.

Ragards,
Leandro.

Thanks Leandro, i already done with SSH setup with private and public key set up. i need the exact command to grep simultaneously in aall the servers

---------- Post updated at 09:21 PM ---------- Previous update was at 09:11 PM ----------

Thanks a lot Leandro .Actually these are the production servers it having the complete data, currently we don't have the contingency servers . We need to search ts data with out effecting any production environment.
Could you please suggest/help me , the command which is provide to us , by using this command can we create shell script to enter the servers and path in user console ?

With Regards,
Prasad G.

You can try:

echo server1 server2 server3 ... | 
xargs -P 10 -I % ssh user@% grep "'regex'" remote-file-name

Otheus: a plain grep won't give any results if the file is compressed, that's why I suggested the use of zcat.

oh right. well, given that he's got .gz file, he should expect to find "zgrep" working just fine.

Well, the one-liner that I wrote before it's pretty close to what you want.

I shall explain it in more detail:

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]; do echo $server; ssh $server zcat /path/to/voter.gz | grep -l "keyword"; done

The

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]

executes the command following the reserved where "do" for all the hostnames in the list between square brackets ().

The

echo $server

just prints out the hostname in stdout for a more clear output trace.

The

ssh $server zcat /path/to/voter.gz | grep -l "keyword"

does this: connect to $server defined by the for loop and do a zcat on the file. This prints to stdout all the contents of the compressed files. Immediately after that pipe (|) the output to a grep command that will search for the keyword. The "-l" argument will show only the name of the file if the keyword is found. If you omit that, you'll get the actual text line of the keyword.

If the user in your workstation is not the same as in the servers you should replace ssh $server with ssh user-name@$server, where user-name is the login in the servers.
On the other hand, if you have different logins for your servers, I suggest replacing [ mumbai bangalore chennai delhi hyderabad other-hostnames ] with [ user@mumbai user@bangalore user@chennai user@delhi user@hyderabad user@other-hostnames ]. Each of the user for each of the servers could be completely different. Example: root@mumbai prasad@chennai user2345@delhi, etc.

Using "for" you'll be searching through the servers one at a time. To bypass this, you could execute all the ssh commands in background logging the output to files and then look for the results in those files.

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]; do echo $server; ssh $server zcat /path/to/voter.gz | grep -l "keyword" & done

Note the ampersand replacing the last semicolon.

You can safely put this on a script that would look something like this:

#!/bin/bash

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]; do echo $server; ssh $server zcat /path/to/voter.gz | grep -l "$1" & done

and then, when you invoke the script you should do it like this:

./your-script.sh keyword

I do realize something else: if you have multiple compressed files to search for the faster way would be to use a find command and then execute the grep. The line should be something like this:

for server in [ mumbai bangalore chennai delhi hyderabad other-hostnames ]; do echo $server; ssh $server "find /path/to/file/*.gz -exec zcat {} \; | grep -l "$1"" & done

I actually don't have a *nix machine to test this out. I'm sure some of the double quotes could be wrong, so you should experiment yourself.
If you have any question feel free to ask but take in consideration the latter (you have to try it yourself!).

Good luck,

Leandro.

---------- Post updated at 11:29 AM ---------- Previous update was at 11:11 AM ----------

Yep! If he has that command installed in all servers, it'll do just fine. In fact, it would be recommended instead of "zcat | grep".