Conference PaperPDF Available

BIG CYCLING DATA PROCESSING: FROM PERSONAL DATA TO URBAN APPLICATIONS

July 2016

July 2016

Conference: 2016 IPSRS Conference, International Society for Photogrammetry and Remote Sensing
At: Prague, Czech Republic

Authors:

Christopher J. Pettit

UNSW Sydney

Scott N Lieske

The University of Queensland

Simone Leao

UNSW Sydney

Understanding the flows of people moving through the built environment is a vital source of information for the planners and policy makers who shape our cities. Smart phone applications enable people to trace themselves through the city and these data can potentially be then aggregated and visualised to show hot spots and trajectories of macro urban movement. In this paper we present some preliminary findings using cycle data collected from a smart phone application known as RiderLog. We focus on the RiderLog application in the context of Sydney, Australia, and discuss the procedures and challenges in processing and cleaning this data before any analysis can be made. We then present some preliminary maps results visualised with the CartoDB online mapping platform. We conclude the paper by highlighting some of the key challenges in working with such data and outline some next steps in processing the data and conducting higher volume and more rigorous analysis.

Sydney Area Rider Routes by Ride Purpose

…

Figures - uploaded by Christopher J. Pettit

Content may be subject to copyright.

Content uploaded by Christopher J. Pettit

Content may be subject to copyright.

Content uploaded by Christopher J. Pettit

Content may be subject to copyright.

Content uploaded by Simone Leao

Content may be subject to copyright.

BIG BICYCLE DATA PROCESSING:

FROM PERSONAL DATA TO URBAN APPLICATIONS

C. J. Pettit, Lieske S. N., Leao S. Z.

City Futures Research Centre, The University of New South Wales:

c.pettit@unsw.edu.au, s.lieske@unsw.edu.au, s.zarpelonleao@unsw.edu.au.

Commission II: Theme Session 17 Smart Cities

KEY WORDS: Big data, little data, data processing, data visualisation

ABSTRACT:

Understanding the flows of people moving through the built environment is a vital source of information for the planners and policy

makers who shape our cities. Smart phone applications enable people to trace themselves through the city and these data can

potentially be then aggregated and visualised to show hot spots and trajectories of macro urban movement. In this paper our aim is to

develop procedures for cleaning, aggregating and visualising human movement data and translating this into policy relevant

information. In conducting this research we explore using bicycle data collected from a smart phone application known as RiderLog.

We focus on the RiderLog application initially in the context of Sydney, Australia and discuss the procedures and challenges in

processing and cleaning this data before any analysis can be made. We then present some preliminary map results using the CartoDB

online mapping platform where data are aggregated and visualised to show hot spots and trajectories of macro urban movement. We

conclude the paper by highlighting some of the key challenges in working with such data and outline some next steps in processing

the data and conducting higher volume and more extensive analysis.

1. INTRODUCTION

1.1 Big Data and Smart Cities

In an age of big data, little data and smart cities (Batty, 2015)

there is an imperative to provide more accessible evidence to

planners and policy makers to better shape our cities. Our

research aim is to develop procedures for cleaning, aggregating

and visualising human movement data and translating this into

policy relevant information. In this paper we focus on

movement data of bicyclists who are using the RiderLog

application to capture data about their individual bicycle

journeys across the City of Sydney, Australia. This little data

(individual record bicycle journeys) can then be aggregated

across a city, which then becomes big data. With big data there

are challenges in processing and cleaning which we outline in

this paper. Next, we present some preliminary findings

visualised using the Carto DB online mapping platform. Finally,

we conclude the paper by highlighting some of the key

challenges in working with such data and outline next steps in

the research.

1.2 From personal to urban applications: the need for Big

Data pre- and post-processing

The growing volumes of data available from sensors, social

media, and other digital interconnected systems are seen as

‘remarkable opportunities for researchers and policy analysts’

(Shneiderman and Plaisant 2015, p. 1). Indeed, the idea of

‘smart cities’ heavily relies on the possibility of integrating and

understanding big (geospatial) data and turning it into

knowledge and intelligence which is used to shape better and

more effective urban environments (Batty, 2013; Li et al.,

2015).

In this study we are focused on data produced by individuals

motivated by personal goals. This type of data is becoming

more prominent with the ubiquity of smart phones with multiple

sensors, and the increasing use of mobile phone applications for

daily routine activities in society (Lane et al. 2010). When

combined in a crowd scale these data may have the capacity to

reveal macro behavioural patterns which are of interest for city

planners and policy makers alike. The transposition of these big

datasets into urban research or urban planning, however, is not

a simple exercise. The different purposes between data

production and application, together with issues associated to

privacy, human inconsistencies, and device inaccuracies, pose

challenges to its practical use.

Contemporary datasets, according to Laney (2001), are

characterised by their volume (data size is large), their velocity

(data is created rapidly and continuously), and variety (data

comprises multiple types and is captured from different

sources), also known as the 3Vs of big data. IBM estimates that

2.5 quintillion bytes of data are generated every day, and that 90

percent of today’s data has been created in the last two years

alone (Zhang et al. 2012). However, more data does not

necessarily mean more useful information, since big data is also

highly heterogeneous, complex, unstructured, incomplete, and

noisy (Ma et al. 2014), and most current information systems or

methods are unable to handle and process big data (Tsai et al.

2015).

Knowledge discovery in databases has always required a

number of operations and processes to turn data from a raw

state into a more appropriate format for analysis and

visualisation (Fayyad et al. 1996), even when they had smaller

size and complexity. Big data brings some additional

challenges. Pettit et al. (2012) introduced a number of

visualisation techniques for representing urban space and place;

however, these are not specifically focused on the application of

big data.

Tsai et al. (2015) presented a comprehensive review on efforts

attempting to produce new methods that are able to handle big

data during the input, analysis and output stages of knowledge

discovery. They identified that most of the recent literature is

focused on innovative methods for data mining and analysis,

with much less attention to the pre- and post-analysis

processing methods.

Data detection, selection, cleaning, filtering, correction,

completion, and transformation are some of the pre-processing

methods applied to prepare databases with the objective to

obtain more accurate, complete and compatible information sets

(Fayyad et al. 1996). In the context of big data, new approaches

are combining those traditional methods with strategies to

reduce its size and complexity. These can include the extraction

of relevant records, event types, or key events; folding data to

make cyclic patterns such as days or weeks clear; and pattern

simplification strategies to simplify complex sequence of events

(Shneiderman and Plaisant 2015). The question that remains is

to what extent reduction can be made before losing important

meanings. Pre-processing of big data can be so demanding that

Kandel et al. (2011) refers to it as ‘data wrangling’.

Similarly, big data also raises new challenges for the post-

processing analysis stage of knowledge discovery in databases,

which in most cases are associated to the visualisation of

patterns and processes. ProfitBricks (2015) presented a brief

review of 39 data visualisation tools for big data currently

available, including open-source, free, and commercial

platforms. Some examples with geographic mapping

capabilities include GoogleMaps1, CartoDB2, ProcessingGIS3,

and Leaflet4. With varied levels of sophistication, these

developments demonstrate that this is a field in expansion.

Interestingly, Kendal et al. (2011) argues that ‘analysts might

more effectively wrangle data through new interactive systems

that integrate data verification, transformation, and

visualization’ (p. 1); therefore, bringing pre- and post-

processing close together. This is the underlying approach for

the research undertaken in our focus on bicycling data.

1.3 Big Cycling Data from participatory sensing via mobile

phones

There is a growing trend to use mobile devices and applications

to collect data relating to fitness activities (Clarke and Steele,

2014). Some mobile phone applications currently available for

bicycling include MapMyRide5, iBike6, Cycle Meter7, Strava8,

and RiderLog9. They vary in their format, purpose and

functionalities; some save routes and monitor progress of

ordinary riders, some are designed for professional riders, while

others are more focused on bicycling for transport. What they

all have in common is that they produce large amounts of

complex data that document riding journeys. Individually, each

application comprises data records with locations, time and

intervals, and other attributes, which are organised into the

specific application’s format and purpose. Daily, new records

are captured from registered users, new users join the system,

and some previous users may disconnect from the system.

1 https://developers.google.com/maps/

2 https://cartodb.com/

3 http://processingjs.org/

4 http://leafletjs.com/

5 http://www.mapmyride.com/

6 https://itunes.apple.com/au/app/ibike/id369550718?mt=8

7http://itunes.apple.com/us/app/cyclemeter-gps-

bikecomputer/id330595774?mt=8

8 www.strava.com

9 https://www.bicyclenetwork.com.au/general/programs/1006/

At the same time, there is an increasing interest from city

planners and policy makers in evidenced based research in

active transportation. This is due to contemporary issues

associated to health (chronical disease associated with physical

inactivity) (Pratt et al. 2014), and the environment (transition to

less carbon intensive cities) (Haines and Wilkinson 2014).

Recent research has been focused on the development and

evaluation of mobile applications associated to bicycling, such

as BikeNet (Eisenman et al. 2009), Biketastic (Reddy et al.

2010), and SocialCycle (Navarro et al. 2013). However, most of

this research describes the mobile applications within the

context of individual use, making only brief mention to its

potential implications for aggregation both spatially and

temporally to assist with city planning and policy making across

urban geographies. In fact we could not find any reported study

directly concerned with the transposition of data produced by

individuals with a personal goal, into a database useful for the

wider purpose of urban planning analysis.

Many factors can cause noise, inconsistencies, errors,

inaccuracies, and incompleteness in the personal tracking data

collected by people using their mobile phones via a specific

application for bicycling. Inaccuracies can come from weak

signal (i.e. in urban canyons in the CBD or in underground

areas); incomplete data can occur if the signal is completely

lost, if the batteries of the phone go flat, or if an incoming call

interrupts the app (depending on the app). Incompleteness is

also related to the fact that some people do not record all of

their rides, or riders use different mobile applications, or do not

change options when undertaking different types of journeys.

The rider may have multiple purposes in a trip, simplifying its

answer to the system with a single purpose. Some riders may

also do their cycling as part of a multi-modal journey, placing

the bike, for example, temporarily inside a train. All these

sources or noise in the data are not a concern for individual

users of the mobile application for the purpose of monitoring

their fitness progress. However, these varied sources of

inconsistencies, accumulated to millions or billions of records,

can have a great adverse impact when this data is aimed to be

used for city planning or policy making.

2. RESEARCH DESIGN AND METHODS

Bicycle Network have developed the RiderLog application10

which is a free smart phone application and captures the

location of a bicyclist every forty seconds. For this research

project we have bicycling route data covering all of Australia

from 2010 to 2014. This includes 148,769 bicycle journeys

undertaken by 9,727 cyclists. In this study we focus on 26,242

routes completed by 1,923 unique cyclists from the year 2010 to

year 2014 in New South Wales (NSW). In this paper we focus

specifically on its application for Sydney, the capital city for the

State of New South Wales.

Data processing steps and flow are summarized in Figure 1.

Original data are a 421 MB text file. In order to structure and

clean the data as well as begin the process of fixing errors we

brought the data into Microsoft Excel (text Import Wizard,

limited with character, “|”). We then separated the data into

smaller files based on Australian state or territory: New South

Wales, Queensland, Northern Territory, South Australia,

Tasmania, Victoria and Western Australia. These files were

10 https://www.bicyclenetwork.com.au/general/programs/1006/

saved individually as, “rider_STATE.xlsx”. The New South

Wales (NSW) data rider_NSW.xlsx file at this stage was 58.1

MB.

Figure 1. Data Processing Steps and Flow

Extracting geographic data from these tables was a challenge

due to data size and formatting. In the Excel files geographic

data are contained in a text formatted column, ‘Route’. For

NSW, individual route records contain between 62 to 32,753

characters with latitude longitude pairs separated by commas

and irregularly interspersed with miscellaneous words and other

characters. Cleaning the data required removing words and

characters using Excel’s find and replace capabilities. For NSW

68,153 unneeded strings of varying length were removed from

the Route column.

Separating the often long strings of latitude and longitude

information to Lat/Long pairs and saving them within

individual data columns were accomplished with a series of two

Excel functions, LEFT and MID. These functions enable

extraction of a subset of a string from the left-most character or

from the middle of a string using an indicated position,

respectively. LEFT and MID were used in two separate

formulas where the LEFT equation extracts a Lat/Long pair

from the long string and the MID equation copies the original

string less the Lat/Long pair in the first formula. Together, these

formulas allow all Lat/Long pairs to be extracted to single

columns from the original string in a recursive fashion. The

32,753 characters in the longest NSW cycling routes result in

915 Lat/Long pairs. If developed as a single data table these 915

pairs multiplied by the 26,243 data records along with the 27

columns already in the spreadsheet would result in a

spreadsheet containing over 1,856 columns and 48,707,000

cells. This large data volume required what would be an easy

series of iterations within a data table for a small dataset to

proceed in a stepwise fashion. The first step in processing this

large volume of data was to determine the length of the

geographic data (route) strings in each data record then sort

from smallest to largest. For the first 125 Lat/Long pairs we

were only able to process 25 pairs at a time. For each grouping

of 25 Lat/Long pairs we used the LEFT and MID formulas

above to extract the geographic data then used a VBA script to

replace formulas with data values in the worksheet. We then

deleted ‘no data’ values (where the long string had been full

parsed). Next, the intermediate data generated by the MID

equations were deleted which substantially reduced the file size.

In the case of the second iteration (the first 50 Lat/Long pairs)

this stepwise process of replacing formulas with values and

deleting intermediate data reduced the data set from 186 MB to

85.1 MB. After having completed five of these iterations, for

the first 125 columns, the vast majority of the data (22,000 of

26,243 routes) including nearly all route records of 4,000

characters or less had been processed (Figure 2). The remaining

routes, although the strings were longer, were able to be

processed considerably more quickly and were therefore

processed in groups of 100 Lat/Long pairs.

Figure 2. Length of Geographic Data Strings

Another part of data structuring and cleaning involved working

with time stamp data. The original data for ride start and finish

time were amalgamated with date and year in the format:

“10/1/2011 6:40:16 AM”. From these data we generated several

time variables through a process of extracting a subset of a

string from a string in a process similar to that described above.

We generated a Year variable, a Month variable as an integer

indicating one through 12, a Date variable as a combination

date and month where, for example, 10/31 indicates 31 October,

and day of the week. We also separated start and finish time

from the original data resulting in HH:MM:SS format, e.g.

“06:40:16” that excluded the day, month and year included in

the original data. The duration variable included with the

original data was incorrect so we re-calculated trip duration by

subtracting start time from finish time.

The next step in cleaning the data was to delete several columns

that became redundant or are otherwise not necessary:

DateStarted, DateFinished, TimePaused, Route, and year of

birth. TimePaused contained no data. Year of birth data are

captured by rider age and date of ride which are both retained in

the data set.

After data structuring, cleaning and the preliminary processing

indicated above data we transposed data from records based on

routes to geographic data where records are based on Lat/Long

coordinates suitable for CartoDB and GIS import. We

restructured the data from routes to geographic coordinates with

a pivot table type data summarization using visual basic for

applications (VBA) within Microsoft Excel. Data were first

saved in a new worksheet as values rather than formulas. The

VBA script then transposed the data from records based on

route to a more standard spatial data format where data indicate

point locations of a bicycling route. Along with the pivot of the

Lat/Long data the VBA script associated attributes originally

linked with individual cycling routes to each Lat/Long pair.

These attributes include rider ID, top speed, ride purpose, the

aforementioned time variables, start local government area

(LGA), end LGA, age and gender.

The challenges, solutions and processing volumes in working

with the NSW RiderLog data are summarized in Table 1. The

solutions used to address the problems of extra text in the

Lat/Long strings, correcting the start and finish time, fixing the

incorrect rider duration and splitting the Lat/Long data from one

to two columns, although sometimes time consuming to

execute, were easily accomplished with standard MS Excel

commands that could be applied simultaneously to data

columns or entire worksheets.

Other errors required more sophisticated techniques including

occasions where rides took place across two days resulting in

duration errors, the lack of a time stamp on Lat/Long pairs and

the overall volume of data. Fixing duration errors associated

with multi-day rides required replacing the simple ride duration

formula (finish time minus start time) with a separate formula

that is able to calculate duration while accounting for multiple

days. There were 68 of these errors in the NSW data.

The solution to the lack of time stamps associated with

Lat/Long observations required estimating the time of each

location observation. Calculation of a new data field,

“TimeModel” was also embedded within the pivot script.

Timemodel was calculated by dividing trip duration by number

of segments, summing this value for the segment number in

question and adding this value to start time in order to get an

approximate time stamp for each point in a rider’s route.

The overall challenge of dealing with the large volume of data,

especially reducing the complexity of route data formatted as

long text strings, required numerous iterations performed in

such a way as to minimize file sizes at each given stage. The

volume of data also required the use of multiple files (4 .xlsx

files) to accommodate the entire New South Wales RiderLog

2010 – 2014 dataset. We estimate the RiderLog 2010 – 2014

dataset the covers the entirety of Australia would require 21

.xlsx files. At both the New South Wales and national scales

this volume of data exceeds the definition of big data presented

in Batty (2013); any dataset which cannot fit into an Excel

spreadsheet.

Challenges

Solution

Volume

Extra Text

Find/Replace

68,153 strings

Start/Finish Time

Incorrect

Switch labels 2 columns

Ride Across two Days

Change

formula

Ride Duration Incorrect

Re-calculate

26,243 records

Amalgamated Lat/Long

Separate

1 column

No Time Stamp on

Lat/Long data

TimeModel All records

Data Volume

Iteration;

multiple data

files, 4 .xlsx

files for NSW

>48,700,000

possible cells

Table 1. Big Data Challenges, Solutions and Volume for the

NSW Rider data

CartoDB offers multiple ways to view cycling route data, as

points which may be displayed in time sequence and as routes

(lines) which display well in still images. The final step in

preparing our data tables for upload into CartoDB was to split

the amalgamated Lat/Long pairs into two columns. Bringing

point data into CartoDB required connecting, or uploading, the

data and georeferencing, a simple matter of specifying the

latitude and longitude columns in the data. Once correctly

georeferenced data could be displayed and manipulated within

CartoDB. Uploading route data as lines into CartoDB was

facilitated by bringing the point data into GIS and converting

points to lines based on RouteID.

In CartoDB, both temporally dynamic torque maps based on

point data and polyline based cycling route maps may be

displayed by category. The CartoDB Map Layer Wizard may be

used to select among several category options, select the column

that contains the data one wishes to view, and adjust the legend

to symbolize categories by colour.

3. RESULT AND ANALYSIS

In this section of the paper we illustrate how the processed

RiderLog data can be visualised and then undertake some

preliminary analysis of the results with a focus on the City of

Sydney. User profile data which includes gender and year of

birth can be used to create specific map visualisations of the

flow of bicyclists across the city. Also, the user can specify the

purpose of each individual trip and again this can provide an

interesting breakdown of who is bicycling where for what

purpose. Other views of the data can be made on ride duration

and origin of the trip. Using CartoDB we have created a frame

where these parameters may be toggled on or off to visually

analyse bicycle behaviour across the city.

It is important to understand the purpose of a ride when city

planners and policy makers are considering new bicycle

infrastructure. Different Smart Phone applications are used for

different purposes. The RiderLog application is predominantly

used by those commuting and travelling for transport, rather

than recreational purposes. Hence, the results in Figure 3 show

a predominance in bicycle trips made for transport. We can see

the Central Business District (CBD) of Sydney is a hot spot of

activity as one would expect for bicycle movements for travel. It

also highlights the CBD provides limited recreational

opportunities for cyclists. However, examining Figure 3 we can

see that the roadway within Centennial Park, which is located 1

kilometre out of the CBD, is a major attractor for recreational

bicyclists. Using such data we can begin to understand which

part of the city’s bicycle network is being used for what purpose

and this can in turn provide valuable information on future

infrastructure planning provisions across the city.

Figure 3. Sydney Area Rider Routes by Ride Purpose

One of the user profile questions in RiderLog is related to the

age of the bicyclist. We can see in Figure 4 that a significant

number of bicyclists using the RiderLog application are around

40 years of age. It is difficult for such an application to record

journeys made by children travelling to school independently as

most will not have their own smart phone application to record

their journey. There is also no mechanism in the Application for

a bicyclist to log if there might have been a child on board a

bike being dropped to school. So the use of such applications is

difficult in trying to find out information about bicycling

behaviour of children. However, preliminary analysis of this

aged based data would suggest policy makers might like to

target bicycle promotion programs to those aged below 40 if the

goal was to increase bicyclist numbers across the City.

Figure 4. Sydney Area Routes by Rider Age

Understanding gender patterns in bicycling is very important in

order to develop and maintain bicycle infrastructure which is

used by both male and female riders. In Figure 5 we can see the

females (in red) follow distinctive and a more limited set of

routes when bicycling as compared to males. If we visually

ground truth this information we see that female bicyclist

activity is more restricted to those routes where there is

dedicated and indicated bicycling lanes. Whereby male bicyclist

seems to be more comfortable bicycling in areas where bicycle

lanes may not be available. This preliminary analysis suggests

that by providing more dedicated bicycling infrastructure this

could likely result in an increase of female bicyclists across the

city.

Figure 5. Sydney Area Rider Routes by Gender

Understanding how far people are willing to bicycle and from

which origin to destination is also an important piece of

evidence when planning for travel behaviour. Figure 6

illustrates the travel time taken by bicyclists to arrive at their

destination. It indicates that most trips range from 0-15km.

Such information is important in understanding bicycle

catchment areas which is useful in the formulation of city wide

transport and planning strategies.

To further understand bicycle movements across the city origin

data can be most useful (Figure 7). This data can highlight

suburbs and precincts where bicyclists reside. This can further

assist planners in developing municipal specific bicycle

infrastructure strategies. For example those areas where

bicyclists live and are in close proximity to a train station might

then be a candidate for a bicycle parkiteer (a secure parking

station) which supports multi-modal travel behaviour as

illustrated in Figure 8.

Figure 6. Sydney Area Rider Routes by Rider Duration

Figure 7. Sydney Area Rider Routes by Ride Origin

Figure 8. Bicycle parkiteer situated at a train station. Photo

attributed to Bicycle Network

4. CONCLUSIONS

In this paper we have discussed the processing, visualisation

and potential application of bicycling data acquired from

individuals using the RiderLog Smart Phone Application. We

have discussed methods for processing and cleaning the data,

which have initially been undertaken using Excel. However,

given the need to use multiple files and iterative processing to

accommodate one data set we conclude Excel is not sufficient

for handling such data efficiently. When we move to the next

steps of the research we will be looking at upscaling this

approach to include Riderlog data from across all of Australia.

Next steps will look at migrating the database into a platform

which handles big datasets and better support processing and

cleaning, possibly R Project through the pbdR initiative

(Ostrouchov et al. 2012).

Through our cleaning processes we have identified a number of

challenges in aggregating this data from individual rider

journeys to a city wide analysis. In addition to efficiently

handling the large volume of data, the most notable of these

challenges was separating long strings of text containing

geographic data on bicycling routes. While solutions to some

data problems were easily implemented and others were more

challenging, it is important to be cognisant of how data

manipulations at any given stage would impact the data at future

stages of development as well as in the final analysis. When

processing big data it was also important to be cognisant of

one’s end goal and/or desired product outputs as a means of

evaluating the potential impact of data manipulations at any

given step.

Once we have the resultant data processed and cleaned, the

CartoDB online mapping platform was used to visualise the

results across the City of Sydney. In future work with real-time

data feeds, potentially at the scale of analysis of the entire

country of Australia, we will exploit CartoDB’s real-time Big

Data Connectivity and CartoDB’s Deep Insights Technologies.

Using Deep Insights one may manipulate and visualise

hundreds of millions of spatial data points. The use of such

online mapping platforms provide a powerful vehicle for city

planners and policy makers to interact with the data and make

more evidence-based decisions about the shaping of our cities

(Pettit et al. 2015). In this paper we have focused on data and a

visualisation platform which can be used to support city

planning in relation to active transport and providing

recreational opportunities. However, it is important to note the

data collected from smart phone applications such as Riderlog

are not considered statistically rigorous. Future research will

examine data fusion techniques that can be deployed to

combine Riderlog data with more systematic bicycle count data,

household travel survey data and other sources to provide a

richer picture and more robust data source to support evidenced

based decision making.

In an increasingly urbanised world we continue to plan for the

sustainable growth of our cities. This includes promoting active

transport and looking for solutions to alleviate congestion.

There is a critical need for evidenced based city planning and

policy making which uses data from a rich variety of sources to

address such concerns. The potential of smart phone collected

data such as the Riderlog data presented in this research

provides an important source of truth which can be further

interrogated to understand the flow of people and how they

interact with each other and the built environment.

5. ACKNOWLEDGEMENTS

The authors would like to acknowledge Bicycle Network for

supplying the RiderLog data which has made this analysis

possible. Peter Patterson, from the City Futures Research Centre

is also acknowledged for his preliminary exploratory work with

the Riderlog data.

6. REFERENCES

Batty M, 2013. Big data, smart cities and city planning,

Dialogues in Human Geography, 3 (3): 274–279.

Batty, 2015. Data About Cities: Redefining Big, Recasting

Small, Data and the City Workshop, The Programmable

City Project at National University of Ireland, Maynooth,

Agu 31- Sept 1st, 2015. http://www.spatialcomplexity.

info/files/2015/08/Data-Cities-Maynooth-Paper-

BATTY.pdf

Clarke A and Steele R, 2014. Health Participatory Sensing

Networks, Mobile Information Systems, Volume 10 (3):

229-242

Eisenman S B, Miluzzo E, Lane N D, Peterson R A, and

Campbell A T, 2009. BikeNet: A Mobile Sensing System

for Cyclist Experience Mapping, ACM Transactions on

Sensor Networks, 6 (1): Article 6, December 2009.

Fayyad U M, Piatetsky-Shapiro G, Smyth P, 1996. From data

mining to knowledge discovery in databases. AI Magazine,

17 (3): 37-54.

Haines A and Wilkinson P, 2014. Health in the ‘Low-Carbon’

Economy, Chapter 75 In: Freedman B (ed.), Global

Environmental Change, Springer Science+Business Media,

Dordrecht.

Kandel S , Heer J , Plaisant C , Kennedy J , van Ham F , Riche

N , Weaver C , Lee B , Brodbeck D and Buono P,

2011.Research directions in data wrangling: visualisations

and transformations for usable and credible data,

Information Visualization, 10 (4): 271-288.

Lane D, Miluzzo E, Lu H, Peebles D, Choudhury T, and

Campbell A T, 2010. A Survey of Mobile Phone Sensing,

IEEE Communications Magazine, September 2010, p. 140-

150.

Laney D, 2001. 3D data management: controlling data volume,

velocity, and variety. META Group, Technical Report,

2001.

Li, S., Dragicevic, S., Castro, F.A., Sester, M. Winter, S.

Coltekin, A, Pettit, C.J. Jiang, B., Haworth, J., Stein, A.

(2015). Geospatial big data handling theory and methods: A

review and research challenges. ISPRS Journal of

Photogrammetry and Remote Sensing. rXiv:1511.03010

[physics.soc-ph].

Ma C, Zhang H H, Wang X, 2014. Machine learning for big

data analytics in plants. Trends in Plant Science, 19 (12):

798-808.

Navarro K F, Gay V, Golliard L, Johnston B, Leijdekkers P,

Vaughan E, Wang X, and Williams M-A, 2013.

SocialCycle: What Can a Mobile App Do To Encourage

Cycling? In Proceedings of the Second IEEE International

Workshop on Global Trends in Smart Cities, pages: 24-30.

Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P., 2012.

Programming with Big Data in R [WWW Document]. URL

http://r-pbd.org/

Pettit, C.J. Barton, J, Goldie, X, Sinnott, R. Stimson, R, Kvan,

T. (2015) The Australian Urban Intelligence Network

supporting Smart Cities, in Geertman S, Stillwell J, Ferreira

J and Goodspeed J (eds) Planning Support Systems and

Smart Cities, Lecture Notes in Geoinformation and

Cartography, pp 243 – Springer, pp 243-259.

Pettit, C. Widjaja, I, Russo, P, Sinnott, R, Stimson, R, Tomko,

M. (2012) Visualisation support for exploring urban space

and place, XXII ISPRS Congress, Technical Commission IV

25 August – 01 September 2012, Melbourne, Australia

Editor(s): M. Shortis, J . Shi, E. Guilbert, ISPRS Annals

Vol 1-2, pp 153-158.

Pratt M, Norris J, Lobelo F, Roux L, Wang G, 2014. The cost of

physical inactivity: moving into the 21st century, British

Journal of Sports Medicine, 48: 171–173.

ProfitBricks Blog, 2015. 39 data visualization tools for big data.

http://blog.profitbricks.com/39-data-visualization-tools-for-

big-data/ Accessed on 02 Dec 2015.

Reddy S, Shilton K, Denisov G, Cenizal C, Estrin D, and

Srivastava M, 2010. Biketastic: Sensing and Mapping for

Better Biking, In Proceedings of CHI 2010: Bikes and

Buses, April 10-15, 2010, Atlanta, Georgia, USA.

Shneiderman B and Plaisant C, 2015. Sharpening analytic focus

to cope with Big Data volume and variety. Visualization

Viewpoints, IEEE Computers Society, May/June 2015,

pages 10-14.

Tsai C-W, Lai C-F, Chao H-C, and Vasilakos A, 2015. Big

Data analytics: a survey. Journal of Big Data, 2 (21): 1-33.

Zhang L, Stoffel A, Behrisch M, 2012. Visual analytics for the

Big Data Era – a comparative review of state-of-the art

commercial systems. IEEE Symposium on Visual Analytics

Science and Technology, Seatle, WA, USA, October 14-19,

pages 173-182.

Understanding the Interaction Between a Protected Destination System and Conservation Tourism Through Remote Sensing

Chapter

Jul 2016

Protected destination systems (PDSs) are touristic spaces with two parts: a protected area and a gateway region where nature meets commerce. This chapter focuses on the social benefits of spatial planning and draws on remote sensing and GIS literatures highlighting social science, ecology and conservation, and parks and protected area applications. A conceptual framework is introduced for the multidisciplinary study of the human, artifactual (e.g., the built environment, laws, policies, projects) and natural domains. Remote sensing tools that hold promise for PDS sustainability encompass diverse sensors and platforms that facilitate aerial photography, satellite, drone, smartphone, and related research activity.

Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges

Article

Full-text available

Oct 2015
ISPRS J PHOTOGRAMM

Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing, analyzing, visualizing and verifying the quality of data. This has implications for the quality of decisions made with big data. Consequently, this position paper of the International Society for Photogrammetry and Remote Sensing (ISPRS) Technical Commission II (TC II) revisits the existing geospatial data handling methods and theories to determine if they are still capable of handling emerging geospatial big data. Further, the paper synthesises problems, major issues and challenges with current developments as well as recommending what needs to be developed further in the near future. Keywords: Big data, Geospatial, Data handling, Analytics, Spatial Modeling, Review

Big data analytics: A survey

Article

Full-text available

Oct 2015

The age of big data is now coming. But the traditional data analytics may not be able to handle such large quantities of data. The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data. To deeply discuss this issue, this paper begins with a brief introduction to data analytics, followed by the discussions of big data analytics. Some important open issues and further research directions will also be presented for the next step of big data analytics.

The Australian Urban Intelligence Network Supporting Smart Cities

Article

Full-text available

May 2015

As the global population continues to grow and an increasing number of people move to cities, there is need for ambitious approaches to provide urban information infrastructures and analytical tools to support smart urban design and planning. This chapter introduces the Australian Urban Intelligence Network, which brings together a network of researchers, planners and policy-makers from across Australia who have access to an online workbench of data and tools. The workbench comprises over 1100 datasets and 100 spatial statistical routines, and a select number of planning support systems and geodesign tools. In this chapter, we outline the urban data and analytical capability the online workbench; introduce a couple of the PSS tools and spatial statistical capabilities through a case study approach. We also discuss the user outreach and capacity building capability program which is a critical component to assist with user adoption. We conclude the chapter with some reflections on the lessons learnt and next steps in the project.

Health Participatory Sensing Networks

Article

Full-text available

Jan 2014

The use of participatory sensing in relation to the capture of health-related data is rapidly becoming a possibility due to the widespread consumer adoption of emerging mobile computing technologies and sensing platforms. This has the potential to revolutionize data collection for population health, aspects of epidemiology, and health-related e-Science applications and as we will describe, provide new public health intervention capabilities, with the classifications and capabilities of such participatory sensing platforms only just beginning to be explored. Such a development will have important benefits for access to near real-time, large-scale, up to population-scale data collection. However, there are also numerous issues to be addressed first: provision of stringent anonymity and privacy within these methodologies, user interface issues, and the related issue of how to incentivize participants and address barriers/concerns over participation. To provide a step towards describing these aspects, in this paper we present a first classification of health participatory sensing models, a novel contribution to the literature, and provide a conceptual reference architecture for health participatory sensing networks (HPSNs) and user interaction example case study.

Big data, smart cities and city planning

Article

Full-text available

Dec 2013

Michael Batty

I define big data with respect to its size but pay particular attention to the fact that the data I am referring to is urban data, that is, data for cities that are invariably tagged to space and time. I argue that this sort of data are largely being streamed from sensors, and this represents a sea change in the kinds of data that we have about what happens where and when in cities. I describe how the growth of big data is shifting the emphasis from longer term strategic planning to short-term thinking about how cities function and can be managed, although with the possibility that over much longer periods of time, this kind of big data will become a source for information about every time horizon. By way of conclusion, I illustrate the need for new theory and analysis with respect to 6 months of smart travel card data of individual trips on Greater London’s public transport systems.

Research directions in data wrangling: Visualizations and transformations for usable and credible data

Article

Full-text available

Oct 2011
Inform Visual

In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of 'data wrangling' often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration are longstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.

3-D Data Management: Controlling Data Volume, Velocity, and Variety

Article

Jan 2001

Doug Laney

Health in the ‘Low-Carbon’ Economy

Chapter

Jul 2014
GLOBAL ENVIRON CHANG

There is growing evidence that activities to mitigate climate change by reducing emissions of greenhouse gases and other climate active pollutants, can have beneficial impacts on public health not only as a consequence of helping to limit the magnitude and speed of climate change but also, in the nearer term, as a result of changes in exposure to environmental pollution and health-related behaviors. Dietary changes, for example reductions in dietary saturated fat intake and replacement with unsaturates of plant origin, may help prevent cardiovascular and other disease risks in high-consuming populations. Transport interventions, especially those that promote active travel (increased walking and cycling), can help increase physical activity, although potentially at some additional risk of road injury, while fuel switching or more efficient vehicles could help reduce air pollution, especially in urban settings. Energy efficiency improvements to housing have the potential for positive and negative effects on indoor air quality and may help protect against the adverse health effects of low and high temperatures. Switching to low-carbon forms of electricity generation has the potential to reduce the health burdens of outdoor air pollution. Such “health co-benefits” of climate change mitigation policies provide an important additional rationale for accelerating the transition to ‘low-carbon’ economies and could help to counterbalance the inertia and vested interests that support unsustainable patterns of development.

Sharpening Analytic Focus to Cope with Big Data Volume and Variety

Article

Jun 2015

The growing volumes of time-stamped data available from sensors, social media sources, Web logs, and medical histories present remarkable opportunities for researchers and policy analysts. Although big data resources can provide valuable insights to help us understand complex systems and lead to better decisions for business, national security, cybersecurity, and healthcare, there are many challenges to dealing with the volume and variety of data. Data cleaning and data wrangling has received some attention with the development of application tools, but data focusing to sharpen the analytic focus remains a challenge. To address this challenge, this article provides a taxonomy of analytic focusing strategies for temporal event sequences.

Machine learning for Big Data analytics in plants

Article

Sep 2014
TRENDS PLANT SCI

Rapid advances in high-throughput genomic technology have enabled biology to enter the era of ‘Big Data’ (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences.

BIG CYCLING DATA PROCESSING: FROM PERSONAL DATA TO URBAN APPLICATIONS

Abstract and Figures

Recommended publications

BIG BICYCLE DATA PROCESSING: FROM PERSONAL DATA TO URBAN APPLICATIONS

BIG BICYCLE DATA PROCESSING: FROM PERSONAL DATA TO URBAN APPLICATIONS

Building a National-Longitudinal Geospatial Bicycling Data Collection from Crowdsourcing. Urban Scie...

Validating crowdsourced bicycling mobility data for supporting city planning