Translate

Monday, March 7, 2011

webspider


PHP web spider
Tutorial below shows how to make custom search engine by using yahoo boss api. If you're looking for search engine customization, then building your own search engine is something you'll want to look into. Here we are using an api provided by yahoo search service. Search APIs are nothing new, but typically they've included rate limits, strict terms of service regarding the re-ordering and presentation of results, and provided little or no opportunity for monetization. 

These constraints have limited the innovation and commercial viability of new search solutions. The name of the api is BOSS.
BOSS (Build your Own Search Service) is different; it's a truly open API with as few rules and limitations as possible. With BOSS, developers and start-ups now have the technology and infrastructure to build next generation search solutions that can compete head-to-head with the principals in the search industry.
Now we can go through the code
At first you need to create an HTML web search page as shown below

pravysoft



search




you will get a text box ,shown below
pravysoft<span style="">  </span>web search
search
Here I am created a text box with name “search” and a submit button. Here I am used POST method for sending form variables. For simple usage action=””, which means post the information on the same page.
Now we can look on the main code for web search engine.
if(isset($_POST['submit']))
{
$search=$_POST['search'];
$request="http://boss.yahooapis.com/ysearch/web/v1/".$search."?format=xml&appid=Uz.......................";
//replace appid with your id
$ch = curl_init($request);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = simplexml_load_string (curl_exec($ch));
// Display search results - Title, Date and URL.
foreach ($xml->resultset_web->result as $result) {
 print ''.$result->title.'
';
}
}
?>

At first you can see isset($_POST['submit']) code .which checks whether user clicked on the submit button or not. if this function returns 1.that means user clicked on the button.Then read the contents of the text box with the help of POST function.on the next step you have to replace “ appid” with your own BOSS id


You will get api id from yahoo boss web site BOSS web site
after replace the pravysoft appid with your own boss id .you can easily make your own web spider.
Now we can go through the PHP cURl function.cURL is a library which allows you to connect and communicate to many different types of servers with many different types of protocols. Using cURL you can:
  • Implement payment gateways’ payment notification scripts.
  • Download and upload files from remote servers.
  • Login to other websites and access members only sections.
PHP cURL library is definitely the odd man out. Unlike other PHP libraries where a whole plethora of functions is made available, PHP cURL wraps up major parts of its functionality in just four functions.
A typical PHP cURL usage follows the following sequence of steps.
curl_init – Initializes the session and returns a cURL handle which can be passed to other cURL functions.
curl_opt – This is the main work horse of cURL library. This function is called multiple times and specifies what we want the cURL library to do.
curl_exec – Executes a cURL session.
curl_close – Closes the current cURL session.
Please note that our BOSS api returns output as simple xml format.which contains information like click url,title etc.So we have to convert the xml data and to access the click url its better to insert xml parsed datum into an array.
Here I used an array $result to store xml parsed data. Then with the help of foreach loop, I separated each title and click url by giving necessary indexes to the result array and placed necessary places of the HTML page.
Complete code below
if(isset($_POST['submit']))
{
$search=$_POST['search'];
$request="http://boss.yahooapis.com/ysearch/web/v1/".$search."?format=xml&appid=Uz.I................";
$ch = curl_init($request);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$xml = simplexml_load_string (curl_exec($ch));
// Display search results - Title, Date and URL.
foreach ($xml->resultset_web->result as $result) {
 print ''.$result->title.'
';
}
}
?>

pravysoft


search