Thursday, February 28, 2013

List of Indian Cities and States

I wanted to display a list of states on my website and when a user chooses a state he may be able to see the cities in that state. I googled to get a database of Indian cities grouped by their state, but in vain. Almost all websites had this feature built in their pages, I tried contacting a few of them to know where they obtained the data. But nobody replied. The data can be obtained through geodatasource but one needs to get the paid version to get the list of all cities. I was not ready to pay for getting the list of cities which should have been available freely. A few weeks back and having gone through enough frustration decided to obtain the data myself, I just needed the data to be available at a single place, so that I can parse it according to my needs. Now my work was

1. Obtain a list of city and states at a single place.

This step was easy. I obtained the list of Indian cities and states at Wikipedia

2. Parse the html and save the data in the required format.

For this I just needed to parse the html page and save the cities and state in my database. I wasn't new to crawling, but this time I decided to do something fresh. I decided to write javascript to parse the data and send it to my server where it could be saved in database for future use. It was to my advantage that&nvar i = 0; window.states = {}; bsp;Wikipedia uses jQuery. I browsed the page using Google Chrome and then opened Console (those who don't know the shortcut to open Console is Ctrl + Shit + I) and pasted the following code



parseData = function() {
    var i = 0;
    window.states = {};

    $("tbody").each(function(index,body) {
            i++
            if(i == 1 || i > 26) 
               return;
            $("tr",body).each( function(index, row) {
               var city = $("td",row).eq(0).text().
                                            toLowerCase().
                                            replace(/[^a-z ]/g, '');
               var state = $("td a",row).eq(1).text().
                                               toLowerCase().
                                               replace(/[^a-z ]/g, '');
               states[state] = states[state] || {cities: []}
               states[state].cities.push(city); 
            })
       })
}
parseData()

Explanation 
The above code creates a JavaScript Object states whose keys are the states of India and value is an object containing the cities of that State. What I have done in the code is

if(i == 1 || i > 26) 
    return;

By looking at the html of the page you can find that the first table and tables after index 26 do not contain the list of cities and state, so I choose to ignore them


$("tr",body).each( function(index, row) {
          var city = $("td",row).eq(0).text().toLowerCase().
                                              replace(/[^a-z ]/g, '');
          var state = $("td a",row).eq(1).text().toLowerCase().
                                                 replace(/[^a-z ]/g, '');
          states[state] = states[state] || {cities: []}
          states[state].cities.push(city); 
})


The code then traverses rest of tables, and for each row in the table, first column determines the name of the city, second column provides the state. Since names can only have characters a to z and spaces, the code replaces all the special characters from the city as well as state name. This name is then used to fill the object.

3. Saving the data in the database

Now comes the difficult part, how to save this data in a database, say MySQL. Using AJAX you can send the data to your server.  But the problem with AJAX is that cross origin AJAX requests are not allowed by servers. For that you need to modify the Access-Control-Allow-Origin policy in your WebServer. The steps to do that in apache are provided in this link.

jQuery code to send the data to server is as follows. After creating the saveCities.php file in your webserver copy paste this code in chrome console.

Options = { url:"http://yourserver/path/to/saveCities.php, type: 'POST', }
ajax = function() { 
         for(state in states) { 
            Options.data = {} 
            Options.data[state]=states[state]; 
            $.ajax(Options); 
         } 
       }

In my database I have two tables, city and states, the structure of those is as follows

CREATE TABLE IF NOT EXISTS `city` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `sid` int(11) NOT NULL,
  `name` varchar(500) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `city_FI_1` (`sid`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1641 ;

CREATE TABLE IF NOT EXISTS `state` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=38 ;

On the server side the data is populated through the saveCities.php file. The code is far from optimum, but since I was not calling it again I kept it simple.

//make a connection
$db = mysql_connect($mysql_server,$myusername,$password,$datbase) or 
      die "error connecting to database server";
foreach ($_POST as $key => $value) {   
     $name = ucwords(str_replace("_"," ",$key));   
     $sql = "INSERT INTO state (name) VALUES ($name)";
     mysql_query($sql);
     $state_id = mysql_insert_id();
     $cities = $value["cities"];   
     foreach ($cities as $i => $city) {       
        $name = ucwords(str_replace("_"," ",$city));       
        $sql = "INSERT INTO state (sid,name) VALUES ($state_id,$name)";
        mysql_query($sql);
     } 
  }