Geospatial plots

 

animate_map_header_r_1600x285

 

Get the Data

The data in this example may be obtained from Kaggle.  It is the TalkingData data set that was contributed by Turi.

https://www.kaggle.com/c/talkingdata-mobile-user-demographics

> head(events,3)
  event_id     device_id           timestamp longitude latitude
1        1  2.918269e+16 2016-05-01 00:55:25    121.38    31.24
2        2 -6.401643e+18 2016-05-01 00:54:12    103.65    30.97
3        3 -4.833982e+18 2016-05-01 00:08:05    106.60    29.70

The events table contains latitude, longitude, and date time information.  If this data contained addresses instead of latitude, longitude coordinates I would need to geocode the address to get coordinates.  The ggmap library function geocode accepts a string that describes a location and usually returns coordinates as longitude and latitude.

> geocode("Shenzen North Subway Station")
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Shenzen%20North%20Subway%20Station&sensor=false
       lon      lat
1 113.9546 22.57566

Density Plot of Events Table Coordinates for ZhongGuo

When the number of coordinates plotted increases I need to use a more expressive method of communicating a variable.  I need a method that allows me to discern between one marker and many.  I can communicate the amount or intensity of information using things such as the area of a circular marker, the thickness of a line or logarithmic intensity.  For this data set I am estimating the density of the event table counts by latitude and longitude.

ggplot2 stat_density2d and stat_contour functions

http://docs.ggplot2.org/0.9.2.1/stat_density2d.html

http://docs.ggplot2.org/0.9.2.1/stat_contour.html

I use ggmap to obtain Google maps raster tiles that have a mercator projection and a Stamen toner theme.  I use the stat_density2d function to overlay a two dimensional kernel density estimation.

 

Get Bounding Box Coordinates

The bounding box coordinates were approximated using the export function from Open Street Map.

https://www.openstreetmap.org/export#map=4/37.72/100.37&layers=T

osm_export_webpage


library(ggplot2)
library(ggmap)
library(gridExtra)

setwd('/home/stoney/Desktop/talkingdata')
events <- read.csv('data/events.csv')

##
## Define bounding boxes for cities w/ high density of
## events data
##

# Zhongguo bounding box
lon_min <- 73
lon_max <- 135
lat_min <- 17.80
lat_max <- 53.64

# Beijing bounding box
bei_lon_min <- 115.6
bei_lon_max <- 117.8
bei_lat_min <- 38.8
bei_lat_max <- 40.36

# Shanghai bounding box
shang_lon_min <- 119.96
shang_lon_max <- 122.12
shang_lat_min <- 30.16
shang_lat_max <- 31.91

# Guangzhou, Shenzen, Xianggang
guang_lon_min <- 112.88
guang_lon_max <- 115.24
guang_lat_min <- 21.80
guang_lat_max <- 23.74

##
## Get subsets of events data within cities with high density
## events data
##

# Get subset of event data in Zhongguo bounding box
zhongguo_events <- subset(events,
lon_min <= longitude & longitude <= lon_max &
lat_min <= latitude & latitude <= lat_max)

beijing_events <- subset(zhongguo_events,
bei_lon_min <= longitude & longitude <= bei_lon_max &
bei_lat_min <= latitude & latitude <= bei_lat_max)

shanghai_events <- subset(zhongguo_events,
shang_lon_min <= longitude & longitude <= shang_lon_max &
shang_lat_min <= latitude & latitude <= shang_lat_max)

guangzhou_events <- subset(zhongguo_events,
guang_lon_min <= longitude & longitude <= guang_lon_max &
guang_lat_min <= latitude & latitude <= guang_lat_max)

##
## Get map tiles for cities with high density of event data
##

# Get Stamen toner themed map tiles for Beijing bbox
beijing_map <- get_stamenmap(bbox = c(left=bei_lon_min, bottom=bei_lat_min,
right=bei_lon_max,top=bei_lat_max),
zoom=8, maptype="toner-lite", crop=TRUE,
force=TRUE)
# Get Stamen toner themed map tiles for Guangzhou bbox
guangzhou_map