Search Engine

Hey guys. I have a quick question. My friends and I are working on a search engine project that will hopefully be up and running by December of 2011. Here's my concern. What programs should I use to create the search engine. [Note: I have already been recommended to use PHP.] Thanks guys! :b:

PHP is just part of the web front end. What sort of data do you have to search? I made a front end for CScope so I could search files and source code, for instance. You could grep -E / fgrep in files, display hits, display clicked hit files. Is there an open source Google out there? Probably not, or just a primitive early version !

Thanks for replying. I actually don't want to search "files". However, I want to search the internet. [Nothing complex. Just being able to find links works for me!] Any ideas?

I'd say you need (at least) 3 components:

  1. A crawler that downloads pages, and follows links on those pages.
  2. An indexer that builds a list of words used on each page (maybe in relation to other words nearby), and saves that to a database.
  3. A front-end to query the database.

For the crawler you can use just about any language since the main limitation is the network speed. For the indexer I'd recommend either C/C++ (for speed) or a language geared towards natural language processing (like Perl). For the front-end you can again choose whatever language you're comfortable with.

Why write when you can download (and if necessary modify and contribute)?

List of search engines - Wikipedia, the free encyclopedia

Search Tools with Open Source Code

A Comparison of Free Search Engine Software by Yiling Chen on SearchTools.com

Thank you to Pludi and DG Pickett. I want to write. though.

---------- Post updated at 01:15 AM ---------- Previous update was at 01:13 AM ----------

Can anyone link the software I need?

What do you mean by "I want to write though"? And you have to decide what software you'll need. If you want to use a scripting language you'll need the interpreter for that. If you want to use C/C++ you'll need a compiler for that.

Let me ask you a question: have you, as of now, written a program more complex than a Fibonacci number calculator before?

  1. I want to write the program myself. I don't want to download, modify, and contribute.
  2. I think I'll go with the interpreter.
  3. I have not written a program more complex than a Fibonacci number calculator before.

Well, it is a project. You have to have an acquisition engine to find target documents. You need a repository friendly to search. You need a user interface to submit searches and present finds. An admin interface for submitting new target areas for the acquisition engine. You need a computer, usually on a network and usually lots of storage.

You want a data structure that expands, updates, deletes in a other-user-invisible way, like leaf-to-root modification. Users coming down the old tree are not bothered by new trees you build to replace, or new subtrees.

A lot of code and though goes into dealing with kill-words, words and phrases that happen so often you never want to index them. You can discover them as they hit a threshold, or just trim them as needed for space.

JAVA using persistent objects may work well for this. You might want to make your own persistent objects out of map'd flat files.