Crime Mapping

This crime mapping system I devise was inspired by SpatialKey and was spurred into action because of excitement to use Python in some way.

Through the process though the hardest part proved not to be the programming or workflow but rather gathering enough data, clean and useful data that is but I think this discussion is best left for another post 🙂

Getting the data
It has always proven to be quite difficult to get information in Cayman as I imagine it’s difficult elsewhere in the Caribbean and even perhaps in some other countries. Even though we have a FOI (Freedom of Information) law in place, the information isn’t always published willingly it’s usually a case of requesting it.

So with that said, I made a request, will actually more than one request. And to be fair it wasn’t that the request was being prevented from being sent to me but rather my requested needed clarification and there was only one person to deal with the request (the initial date of request: October 31st 2012, date the data was received:  December 10th 2012). However after receiving confirmation that my request was being process, I promptly received two excel documents, one with 2010 and 2011 crime data. I was certainly excited but after viewing the data, I started to realize that this data might need a bit cleaning up to remove useless data.

rcips-crime-request1

My initial email to request the crime data

rcips-crime-request2

One of my follow up emails to clarify my request

Email with the data from RCIPS

Email with the data from RCIPS

RCIPS raw crime data 2010

RCIPS raw crime data 2010

Preparing the data
Before I continue you can view my thoughts on data liberation, where I discuss some issues surround access to data and ensuring that data is published in useable format.

Process
– requested the data from RCIPS
– split the data into two files for easier processing (this is because the limit of 2,500 queries from Google’s Geocoding API)
– before running the data, try running the test batch of 10 locations first
– remove duplicates and blank fields (I later discovered a quicker and more precise way of removing duplicates)
– use the Google Geocoding API to get the gps coordinates for the event based on the street address, added a delay on the request, otherwise Google issues a timeout because of too many queries in quick succession
– then write the cleaned data to a .csv file
– process the other file and then merge the two files together
– clean up and add any missing coordinates that Google wasn’t able to get (this really only applies to here in Cayman as not all locations are within their system)
– merged files of crimes for 2010 and 2011 and saved as .csv
– for here I thought everything was good, but…
– convert final merge .csv file to .json for use on the interactive map, other wise the .csv would be used for producing the static maps

Getting location coordinates from Google maps
At this point there was a lot to learn with the Google Maps API. For one thing, when querying the Google Geocoding API, queries are limited to 2,500 geolocation requests per day. So I had to ensure that I had a test batch of around ten locations I could use to test that I was retrieving the correct information after which I would then be able to process the entire set of data.

The script that cleans the data, removes duplicates and get's the coordinates from Google maps API

The script that cleans the data, removes duplicates and get’s the coordinates from Google maps API

Creating the static area maps
After getting the the location information back from google, I was left with another issue. That being, Cayman is not a big country and a lot of our roads and areas aren’t within the Google Map API, in most cases 95% of the time Google had a coordinate and district name for that location in their system. To be sure we I had good useable data, I had to combine the two files in Numbers (Excel) and then either add the district name of the area, based on the coordinates (this was manually done by copying & pasting the coordinates into Google Maps and seeing which district the coordinates were in) and or just removing that crime event because the coordinates weren’t there. This minor cleanup ensured the data was complete and would be useful.

The script that generates the maps

The script that generates the static maps

Crime map webpage
It’s funny as you start to review your process you start to see area where you can make things more efficient and also you can see ways to make it better. That’s what happened when I started to work on a interactive version to the static maps. For one thing I realized that I could easily have on script to manage all aspects of creating the maps. At this point I still wasn’t sure where to host the map, create a new domain, add it to the Local project or just keep in on my blog domain. In the end I decided to go with my blog domain, it would be easier to maintain here, plus I think it goes well to see the fruits of my labour here rather than somewhere else, it gives this long write up some context. Here is the final output of the Crime Mapping system.