ThesisPDF Available

Automated Text Analysis on Open-Ended Response Surveys: Measuring Attitudes Regarding Autonomous Vehicles

Authors:
  • Nissan Digital India LLP

Abstract and Figures

For practical reasons, surveys that aim for a large number of respondents tend to restrict themselves to closed-ended responses. Despite potentially bringing richer insights, open-ended questions pose significant challenges in extracting useful information while significantly increasing the analysis time. Nevertheless, automatic text analysis techniques could speed up the analysis of open-ended responses. Furthermore, open-ended questions in conjunction with closed-ended questions are likely to influence the closed-ended responses. Considering this, we pursued the following four objectives in this thesis, a. to analyse if the method of collecting qualitative data influences the survey responses, b. to develop an approach to extract open-ended responses from a survey and process the data, c. to compare the relative performance of the open-ended and closed-ended responses in analysing qualitative data, d. to develop a framework that measures attitudes while allowing respondents to choose their preferred type of question (closed- or open-ended). This thesis analyses the suitability of using Topic Modelling to extract information from the open-ended responses to measure attitudes. As a case study throughout the whole thesis, we used questionnaires that collect information on the attitudes related to Autonomous Vehicles (AV). In this case study, alternative versions of the questionnaires that consider open- and/or closed-ended questions were presented randomly to respondents. Thus, two datasets were collected, a. 364 responses from India on the intention to use Shared AVs, b. 3002 responses from the USA on the intention to use AVs for commute trips. To quantify the relative benefits, we evaluated the relative performance of the alternative versions of the questionnaire to measure attitudes. In this regard, we assessed the predictive capability of the statistical models estimated using each of these independent datasets. Besides, the responses to the attitudinal questions are evaluated to analyse if the mode of asking questions influence the measured attitudes. Finally, having estimated the models, we developed a framework that measures attitudes by allowing respondents to choose their preferred type of question. Our results indicate that the use of open-ended questions before the set of Likert scale questions could alter the responses to the Likert scale questions. The consequence is a reduction in the number of neutral responses and an increase in positive attitude among those answering the questionnaire with open-ended questions. We also evaluated the suitability of using Topic Modelling techniques such as Latent Dirichlet Allocation and supervised Latent Dirichlet Allocation and found them effective. However, we could not find significant improvements in performance using the supervised approach. When comparing the predictive capabilities of the models estimated using questions that used Likert scale responses with and without open-ended questions, the performance of the models was superior for the dataset which had open-ended questions before the Likert scale responses. However, we could not find it beneficial to replace Likert scale questions with open-ended questions fully. Using the dataset collected from the USA, we proposed a modelling framework that allows researchers/analysts to let respondents answer the questionnaire using question types (closed- or open-ended questions) of their choice. The performance of the proposed model was superior to that of the individually estimated models, particularly for the test set. Index Terms—Topic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian Estimation
Content may be subject to copyright.
UNIVERSIDADE DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
Automated Text Analysis on Open-Ended Response Surveys:
Measuring Attitudes Regarding Autonomous Vehicles
Vishnu Baburajan
Supervisor: Doctor João Antonio de Abreu e Silva
Co-Supervisor: Doctor Francisco Colunas Pereira da Câmara Pereira
Thesis approved in public session to obtain the PhD Degree in
Transportation Systems
Jury final classification: Pass with Distinction and Honour
2021
UNIVERSIDADE DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
Automated Text Analysis on Open-Ended Response Surveys:
Measuring Attitudes Regarding Autonomous Vehicles
Vishnu Baburajan
Supervisor: Doctor João Antonio de Abreu e Silva
Co-Supervisor: Doctor Francisco Colunas Pereira da Câmara Pereira
Thesis approved in public session to obtain the PhD Degree in
Transportation Systems
Jury final classification: Pass with Distinction and Honour
Jury
Chairperson: Doctor Luís Guilherme de Picado Santos, Instituto Superior Técnico,
Universidade de Lisboa
Members of the Committee:
Doctor Francisco Colunas Pereira da Câmara Pereira, Department of Technology,
Management and Economics, Technical University of Denmark, Denmark
Doctor Luís Guilherme de Picado Santos, Instituto Superior Técnico,
Universidade de Lisboa
Doctor Filipe Manuel Mercier Vilaça e Moura, Instituto Superior Técnico,
Universidade de Lisboa
Doctor Catarina Helena Branco Simões da Silva, Faculdade de Ciências e
Tecnologia, Universidade de Coimbra
Doctor Maria Teresa Galvão Dias, Faculdade de Engenharia, Universidade de
Porto
Doctor Paulo Manuel da Fonseca Teixeira, Instituto Superior Técnico,
Universidade de Lisboa
Funding Institutions: Fundação para a Ciência e a Tecnologia, Portugal; European
Cooperation in Science and Technology, Cost Action TU-1305
2021
P a g e | I
ABSTRACT
For practical reasons, surveys that aim for a large number of respondents tend to restrict themselves to closed-
ended responses. Despite potentially bringing richer insights, open-ended questions pose significant challenges in
extracting useful information while significantly increasing the analysis time. Nevertheless, automatic text
analysis techniques could speed up the analysis of open-ended responses. Furthermore, open-ended questions in
conjunction with closed-ended questions are likely to influence the closed-ended responses.
Considering this, we pursued the following four objectives in this thesis, a. to analyse if the method of collecting
qualitative data influences the survey responses, b. to develop an approach to extract open-ended responses from
a survey and process the data, c. to compare the relative performance of the open-ended and closed-ended
responses in analysing qualitative data, d. to develop a framework that measures attitudes while allowing
respondents to choose their preferred type of question (closed- or open-ended).
This thesis analyses the suitability of using Topic Modelling to extract information from the open-ended responses
to measure attitudes. As a case study throughout the whole thesis, we used questionnaires that collect information
on the attitudes related to Autonomous Vehicles (AV). In this case study, alternative versions of the questionnaires
that consider open- and/or closed-ended questions were presented randomly to respondents. Thus, two datasets
were collected, a. 364 responses from India on the intention to use Shared AVs, b. 3002 responses from the USA
on the intention to use AVs for commute trips.
To quantify the relative benefits, we evaluated the relative performance of the alternative versions of the
questionnaire to measure attitudes. In this regard, we assessed the predictive capability of the statistical models
estimated using each of these independent datasets. Besides, the responses to the attitudinal questions are
evaluated to analyse if the mode of asking questions influence the measured attitudes. Finally, having estimated
the models, we developed a framework that measures attitudes by allowing respondents to choose their preferred
type of question.
Our results indicate that the use of open-ended questions before the set of Likert scale questions could alter the
responses to the Likert scale questions. The consequence is a reduction in the number of neutral responses and an
increase in positive attitude among those answering the questionnaire with open-ended questions. We also
evaluated the suitability of using Topic Modelling techniques such as Latent Dirichlet Allocation and supervised
Latent Dirichlet Allocation and found them effective. However, we could not find significant improvements in
performance using the supervised approach. When comparing the predictive capabilities of the models estimated
using questions that used Likert scale responses with and without open-ended questions, the performance of the
models was superior for the dataset which had open-ended questions before the Likert scale responses. However,
we could not find it beneficial to replace Likert scale questions with open-ended questions fully. Using the dataset
collected from the USA, we proposed a modelling framework that allows researchers/analysts to let respondents
answer the questionnaire using question types (closed- or open-ended questions) of their choice. The performance
of the proposed model was superior to that of the individually estimated models, particularly for the test set.
Index TermsTopic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert
Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian
Estimation
P a g e | I
P a g e | II
RESUMO
Por questões práticas, os inquéritos direcionados a um grande número de participantes tendem a restringir-se a
perguntas fechadas. Embora podendo potencialmente fornecer perceções mais ricas, o uso de perguntas abertas
coloca grandes desafios em termos da extração de informação útil, ao mesmo tempo que aumenta o tempo de
análise. Não obstante, certas técnicas de análise automática de texto podem acelerar a análise de respostas abertas.
Além disso, o uso de perguntas abertas em conjugação com perguntas fechadas é suscetível que influencie as
respostas a estas últimas.
Tendo isto em consideração, esta tese seguiu os seguintes quatro objetivos, a. analisar se o método de colheita de
dados qualitativos influencia as respostas dos inquéritos, b. desenvolver uma abordagem para extrair respostas
abertas de inquéritos e processar os dados delas resultantes, c. comparar a performance relativa entre respostas
abertas e fechadas na análise de dados qualitativos e, por fim, d. desenvolver uma estrutura de modelação que
mede atitudes enquanto permite aos participantes optarem pelo tipo de pergunta, aberta ou fechada, que mais
lhes convém.
Esta tese analisa a aplicabilidade de Modelação de Tópicos na extração de informação proveniente de respostas a
perguntas abertas que medem atitudes. Enquanto estudo de caso usado ao longo de toda a tese, foram usados
questionários que coletam informação sobre as atitudes em relação a Veículos Autónomos (AV). Para conduzir
este estudo, versões alternativas dos questionários, constituídas por perguntas abertas e/ou fechadas, foram
apresentadas de forma aleatória aos participantes. No total, dois conjuntos de dados foram coletados, 1. 364
respostas provenientes da Índia sobre a intenção de uso de AV Partilhados e 2. 3002 respostas com origem nos
EUA sobre a intenção de uso de AV para viagens pendulares. Para quantificar os benefícios relativos,
procedemos à avaliação da performance relativa às versões alternativas dos questionários para a medição das
atitudes. A este respeito, os modelos estatísticos estimados separadamente em cada um destes conjuntos de dados
independentes serão avaliados com base na sua capacidade preditiva. Além disso, as respostas às perguntas
atitudinais serão avaliadas para analisar se a forma das perguntas influencia as atitudes que estão a ser medidas.
Após a estimação dos modelos, esta tese desenvolve adicionalmente uma abordagem de modelação que mede as
atitudes permitindo aos respondentes a escolha do tipo de questão da sua preferência.
Os nossos resultados indicam que uso de perguntas abertas antes do conjunto de perguntas em escala de Likert
poderá afectar as respostas a estas últimas. A consequência é a redução do número de respostas neutras e um
aumento na atitude positiva entre os participantes que responderam ao inquérito com perguntas abertas. Também
avaliámos a aplicabilidade do uso de técnicas de Modelação de Tópicos tais como a Alocação Latente de Dirichlet
e a sua versão supervisionada, tendo-se ambas mostrado ambas eficazes. Contudo, não conseguimos encontrar
melhorias significativas de performance no uso da abordagem supervisionada.
Ao comparar as capacidades preditivas dos modelos estimados das perguntas que usaram respostas na escala da
Likert com e sem perguntas abertas, a performance dos modelos foi superior para o conjunto de dados que
apresentava perguntas abertas antes das respostas na escala de Likert. Todavia, concluímos que não é benéfico
substituir totalmente as perguntas em escala Likert por perguntas abertas. Usando os dados provenientes dos EUA,
propusemos uma abordagem de modelação que permite aos investigadores/analistas oferecer aos participantes a
liberdade de escolha entre os dois tipos de perguntas (abertas e fechadas), em função das suas preferências. A
performance do modelo proposto revelou-se superior às dos modelos estimados individualmente, particularmente
para os dados de teste.
Index TermsTopic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert
Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian
Estimation
P a g e | III
P a g e | IV
ACKNOWLEDGEMENTS
I express my sincere gratitude to Prof. João de Abreu e Silva, my mentor, adviser and guide. I thank him for the
insightful conversations we had during the development of this research and my thesis. His expertise in travel
behaviour research, particularly in qualitative data analysis, has been of immense help in understanding the
concepts and in developing the ideas.
I want to express my sincere gratitude to my co-supervisor, Professor Francisco Câmara Pereira of the Technical
University of Denmark (DTU), for continuous support in my research and coursework. His expertise in Machine
Learning, specifically in Topic Modelling and the Model-based Approach to Machine Learning, has been
beneficial in developing this thesis.
This thesis is written as part of my PhD studies in the Instituto Superior Técnico (IST) and the Technical
University of Denmark (DTU), and I acknowledge IST and DTU for hosting me.
The research was funded by the MIT-Portugal Program of the Fundação para a Ciência e a Tecnologia, Portugal.
I sincerely thank the Portuguese Government for their kind support and funding. I want to express my sincere
gratitude to the Director Prof. Luís Guilherme de Picado Santos and Teresa Afonso, whose support helped shape
this research. The European Cooperation for Science and Technology Cost Action TU-1305 Social Networks and
Travel Behaviour supported a case study in my research.
I want to thank my friends and colleagues from IST, particularly Mohammad Sadegh, Mariza, Suresh, Jayanath,
Jayachandran, Joshin and Rahul. I would also like to thank all members of the Machine Learning for Smart
Mobility (MLSM) at DTU, particularly Francisco, Renming, Daniele, Inon, Sergio and Filipe Rodrigues.
I thank my parents (Baburajan and Sreekaladevi) for their constant love, support, and guidance. The values of
competence, persistence, and honesty they instilled in me has helped me immensely. The love and support from
my brothers Hariprasad, Narendraprasad, Ananthakrishnan, Nishanth, Unnikrishnan and sisters Indulekha,
Vandana, Sreelakshmi, Manjulakshmi and Arya have been phenomenal. So are the love and support from
Unnikrishnan, Salu and Ravindran. I also take this opportunity to extend my sincere thanks to Nimisha.
My friends Arunbabu, Meghal, Naveen, Manu, Sajjad, Dani and Chelza, supported me during my PhD. My stay
in Denmark was made memorable by Ashik, Sushanth, Sherin and his family, Smobin and Maria.
There were many ups and downs during my PhD and lectures of Prof. Jordan B. Peterson and Sadhguru have
helped me remain calm and pursue my research.
P a g e | V
P a g e | VI
Dedicated to
My Parents, Grandparents
and
My Little Bundle of Joy
P a g e | VII
P a g e | VIII
TABLE OF CONTENTS
Abstract .................................................................................................................................... I
Resumo ................................................................................................................................... II
Acknowledgements ................................................................................................................ IV
Table of Contents ................................................................................................................ VIII
List of Figures ..................................................................................................................... XIV
List of Tables ...................................................................................................................... XVI
1 Introduction ....................................................................................................................... 1
1.1 Introduction and Background .................................................................................... 1
1.2 Objectives and Scope ................................................................................................. 2
1.3 Overview of the Approach ......................................................................................... 3
1.4 Thesis Contribution and Corresponding Stakeholders .............................................. 4
1.5 Scientific Outcomes from this Thesis ........................................................................ 6
1.6 Organisation of the Thesis ......................................................................................... 7
2 Fundamentals Qualitative Data and Its Measurement ...................................................... 9
2.1 Introduction ................................................................................................................ 9
2.2 Measuring Attitudes in Transportation .................................................................... 10
2.3 Measurement of Qualitative Data ............................................................................ 12
2.3.1 Closed-ended Responses ..................................................................................... 12
2.3.1.1 Types of Closed-ended Scales ..................................................................... 13
2.3.1.2 Optimal Number of Points in a Scale .......................................................... 16
2.3.1.3 Neutrality and its implications ..................................................................... 17
2.3.1.4 Satisficing .................................................................................................... 17
P a g e | IX
2.3.1.5 Other Issues .................................................................................................. 18
2.3.1.6 Analysis of Closed-ended Responses .......................................................... 18
2.3.2 Open-ended Responses ....................................................................................... 19
2.3.2.1 Coding of Open-ended Questions ................................................................ 21
2.3.2.2 Position of Open-ended Questions............................................................... 22
2.3.2.3 Missing Values in Open-ended Surveys ...................................................... 22
2.3.2.4 Extraction of Information from Open-ended Responses ............................. 23
2.3.3 Concluding Remarks ........................................................................................... 24
2.4 Frameworks to Measure the Intention to use ........................................................... 25
2.5 Modelling Approaches to Predict AV Use .............................................................. 28
2.6 Factors Influencing “Intention to use/Pay” For Autonomous Vehicles................... 29
2.7 Natural Language Processing and Its Applications ................................................. 31
2.8 Summary .................................................................................................................. 35
3 Topic Modelling for Open-ended Responses A Case Study on the Intention to Use Shared
AVs .................................................................................................................................... 37
3.1 Introduction .............................................................................................................. 37
3.2 Questionnaire Design ............................................................................................... 38
3.2.1 Experimental Design ........................................................................................... 38
3.2.2 Framework Design .............................................................................................. 38
3.3 Data Collection and Data Cleaning.......................................................................... 39
3.4 Exploratory Analysis ............................................................................................... 40
3.4.1 Preliminary Analysis ........................................................................................... 40
3.4.1.1 Attitudes Towards AVs................................................................................ 42
3.4.1.2 Subjective Norms ......................................................................................... 43
P a g e | X
3.4.1.3 Perceived Behavioural Control Variables .................................................... 44
3.4.1.4 Intention to Use Shared AVs ....................................................................... 45
3.4.2 Statistical Analysis .............................................................................................. 46
3.5 Extraction of Data .................................................................................................... 46
3.5.1 Treatment of Closed-ended Responses ............................................................... 46
3.5.2 Extraction of Information from Open-ended Responses .................................... 48
3.5.2.1 Exploratory Analysis ................................................................................... 48
3.5.2.2 Results from the Topic Models .................................................................... 49
3.5.3 Comparison of Closed- and Open-ended Responses .......................................... 52
3.6 Modelling Framework ............................................................................................. 53
3.7 Estimation Results and Discussion .......................................................................... 54
3.8 Conclusion ............................................................................................................... 57
4 Integrating and Comparing Open- and Closed-ended Responses: A Case Study on AVs for
Commute Trips ................................................................................................................... 59
4.1 Introduction .............................................................................................................. 59
4.2 Questionnaire Design ............................................................................................... 59
4.2.1 Experimental Design ........................................................................................... 60
4.2.2 Framework Design .............................................................................................. 62
4.3 Data Collection and Data Cleaning.......................................................................... 62
4.4 Exploratory Analysis ............................................................................................... 63
4.4.1 Preliminary Analysis on Socio-demographic Characteristics ............................. 63
4.4.1.1 Perceived Ease of Use .................................................................................. 67
4.4.1.2 Perceived Usefulness ................................................................................... 67
4.4.1.3 Perceived Safety Risk .................................................................................. 68
P a g e | XI
4.4.1.4 Perceived Privacy Risk ................................................................................ 69
4.4.1.5 Trust ............................................................................................................. 70
4.4.1.6 Attitudes ....................................................................................................... 70
4.4.1.7 Modal Share for Commute Trips ................................................................. 71
4.4.2 Statistical Analysis .............................................................................................. 72
4.4.3 In-Depth Analysis ............................................................................................... 72
4.5 Extraction of Data .................................................................................................... 73
4.5.1 Treatment of Closed-ended Responses ............................................................... 73
4.5.2 Extraction of Information from Open-ended Responses .................................... 74
4.5.2.1 Exploratory Analysis ................................................................................... 74
4.5.2.2 Results from Topic Models .......................................................................... 75
4.5.3 Comparison of Closed- and Open-ended Responses .......................................... 78
4.6 Modelling Framework ............................................................................................. 80
4.7 Estimation Results ................................................................................................... 84
4.8 Proposed Framework to Model Attitudes Jointly .................................................... 86
4.9 Conclusion ............................................................................................................... 88
5 Conclusion ...................................................................................................................... 91
5.1 Introduction .............................................................................................................. 91
5.2 Data Description ...................................................................................................... 92
5.3 Salient Findings ....................................................................................................... 92
5.3.1 Influence of Questionnaire Type on the Responses ............................................ 92
5.3.2 Approach to Extract Information from Open-ended Responses ......................... 93
5.3.3 Evaluate the Relative Performance of Closed- and Open-ended Approaches .... 93
P a g e | XII
5.3.4 A framework to Measure Attitudes that Allow Respondents to Choose Their
Preferred Questionnaire Type ...................................................................................... 94
5.4 Limitations of the Current Study and Directions for Future Research .................... 94
References .............................................................................................................................. 97
Appendix A .............................................................................................................................. I
6 Questionnaire: Intention to Use Shared AVs (India) ........................................................ I
Appendix B .......................................................................................................................... VII
7 Questionnaire: Intention to Use AVs for Commute Trips (USA) ................................ VII
Appendix C .......................................................................................................................... XV
8 Experimental Design for the Intention to Use AVs for Commute Trips ...................... XV
Appendix D ....................................................................................................................... XVII
9 Results of Topic Model (USA) .................................................................................. XVII
Appendix E ......................................................................................................................... XXI
10 Estimation Results for intention to Use AVs for Commute Trips ............................. XXI
Appendix F....................................................................................................................... XXIX
11 Results for the proposed framework ....................................................................... XXIX
Appendix G ................................................................................................................... XXXIII
12 Python code for Topic Models ............................................................................. XXXIII
Appendix H ...................................................................................................................... XLIII
13 Python Code for the Framework to Measure Attitudes .......................................... XLIII
Appendix I ............................................................................................................................. LI
14 Python Code for the Mapping of Responses................................................................. LI
P a g e | XIII
P a g e | XIV
LIST OF FIGURES
Figure 2.1 The Framework for the Theory of Planned Behaviour Source- [27] ..................... 26
Figure 2.2 The Framework for the Technology Acceptance Model [28] ................................ 27
Figure 2.3 The Framework for the Unified Theory of Acceptance and Use of Technology
(UTAUT) [29] .......................................................................................................................... 28
Figure 2.4 Probabilistic Graphical Model for sLDA ............................................................... 34
Figure 3.1 The Experimental Design for the Intention to Use Shared AVs ............................ 39
Figure 3.2 Modified Framework of the Theory of Planned Behaviour ................................... 39
Figure 3.3 Frequency Distribution for Attitudes towards Use of AVs .................................... 42
Figure 3.4 Frequency Distribution for Subjective Norms ....................................................... 43
Figure 3.5 Frequency Distribution for Perceived Behavioural Control Variables .................. 44
Figure 3.6 Frequency Distribution for the Intention to Use Shared AVs ................................ 45
Figure 3.7 Words Clouds for OE1, OE2, OE3, OE4 ............................................................... 49
Figure 3.8 Inter-topic Distance for a. OE1, b. OE2, c. OE3, d. OE4 ....................................... 52
Figure 4.1 Experimental Design for the Mode Choice for Commute Trips ............................ 61
Figure 4.2 Proposed Framework for the Mode Choice for Commute Trips ............................ 62
Figure 4.3 Frequency Distribution for Perceived Ease of Use ................................................ 67
Figure 4.4 Frequency Distribution for Perceived Usefulness .................................................. 68
Figure 4.5 Frequency Distribution for Perceived Safety Risk of AVs .................................... 69
Figure 4.6 Frequency Distribution for Perceived Privacy Risk of AVs .................................. 69
Figure 4.7 Frequency Distribution for Trust in AVs ............................................................... 70
Figure 4.8 Frequency Distribution for Attitudes towards AVs................................................ 71
Figure 4.9 Frequency Distribution for the Mode for Commute Trips ..................................... 71
P a g e | XV
Figure 4.10 Inter-topic Distance for LDA (clockwise from top left) OE1, OE2, OE3, OE4,
OE5, OE6 ................................................................................................................................. 78
Figure 4.11 Probabilistic Graphical Model for Individual Model ........................................... 82
Figure 4.12 Probabilistic Graphical Model for the Proposed Model ....................................... 83
Figure 4.13 Probabilistic Graphical Model for the Modified Framework ............................... 87
Figure 4.14 Predictions Using the Proposed Framework ........................................................ 88
P a g e | XVI
LIST OF TABLES
Table 3.1 Socio-economic and Travel Characteristics ............................................................ 41
Table 3.2 Results of Factor Analysis ....................................................................................... 47
Table 3.3 Average Number of Words per Response ............................................................... 48
Table 3.4 Top 5 Words for Each Topic for Open-ended Questions ........................................ 50
Table 3.5 Estimation Results of the Model for the “Intention to Use Shared AVs” ............... 55
Table 3.6 Comparison of the Goodness-of-fit Measures for the Test Sets .............................. 56
Table 4.1 SP Experimental Design .......................................................................................... 61
Table 4.2 Socio-Demographic Characteristics ........................................................................ 63
Table 4.3 Travel Characteristics of the Individuals (I) ............................................................ 65
Table 4.4 Travel Characteristics of an Individual (II) ............................................................. 66
Table 4.5 Results of the Statistical Analysis on Whether Open-ended Questions Influence
Responses to Likert Scale Questions ....................................................................................... 72
Table 4.6 Internal Reliability- Cronbach’s Alpha.................................................................... 73
Table 4.7 Average Number of Words per Response ............................................................... 75
Table 4.8 Top 5 Words for Each Topic for Open-ended Questions ........................................ 76
Table 4.9 Mapping between Closed- and Open-ended Responses .......................................... 80
Table 4.10 Goodness-of-fit Measures for Training and Test Set ............................................. 85
Table 8.1 Statements to Measure the Constructs in the Proposed Model and Their Sources XV
Table 8.2 Orthogonal Scenarios (Source: Haboucha, Ishaq and Shiftan[40]) ...................... XVI
Table 9.1 Top 5 Words for Each Topic for Open-ended Questions .................................. XVIII
Table 10.1 Estimated Coefficients for Socio-Demographic Characteristics ........................ XXI
Table 10.2 Estimated Coefficients for Travel, Familiarity with AVs and SP ................... XXIII
Table 10.3 Estimated Coefficients for Likert Scale Responses ......................................... XXIV
P a g e | XVII
Table 10.4 Estimated Coefficients for the Topics ............................................................. XXVI
Table 11.1 Mapping of the Likert Scale Responses for Ver_LK and Ver_LKOE ............. XXX
Table 11.2 Mapping of the Likert Scale Responses for the Extracted Topics from Open-ended
Responses ........................................................................................................................... XXXI
P a g e | 1
1 INTRODUCTION
1.1 INTRODUCTION AND BACKGROUND
Questionnaires are used extensively by researchers and analysts to collect information or
opinions of people. These questionnaires distributed to a sample in the population are used
extensively in travel behaviour research to evaluate travellers behaviour and devise suitable
strategies for policy implementations. However, many of these strategies are subjective;
therefore, researchers started measuring attitudes [1], [2]. In the past, they have been used to
measure the intention to use certain services before the implementation [3][5], attitudes [6]
and consumer/traveller preferences [3], [4], [6][9]; to name a few.
Attitudes are defined as dispositions towards overt action or as verbal substitutes for overt
action [10]. Both researchers and practitioners measure attitudes in a multitude of different
fields, transportation engineering, medical sciences and psychology [3], [6], [10][16]. Since
its initial use, researchers have considerable efforts to develop procedures (closed- or open-
ended questions) to measure attitudes accurately. For example, in the context of AVs, we could
use a closed-ended question such as, “Your opinion on AVs on a rating scale of 1-5 is …” or
an open-ended question “Tell us your opinion about AVs”.
There is an intense debate on the most appropriate method to measure attitudes, closed- or
open-ended. Lazarsfeld [17] brought a temporary truce to this fierce debate, who concluded
that there are merits and demerits associated with each of the two approaches, and the
administrator should use a method that he/she believe best suited for measuring the attitudes
under consideration. Later, Converse [18] summarised the discussions and the evolutions of
the two approaches. Most studies used the closed-ended approach to measure attitudes, while
a relatively smaller number of studies used the open-ended approach, which is primarily due
to the swift operation of closed-ended surveys and higher completion time and execution time
for the open-ended surveys [18], [19]. In the following paragraphs, we briefly discuss the two
approaches and a detailed discussion in Chapter 2.
To measure an individualsattitudes using the closed-ended approach, we present statements
describing the attitude and a suitable scale for the respondents to express their attitudes. For
example, one could use bipolar scales. Scales reduce the burden for the analyst [20], which,
along with other benefits (discussed later), has led to the widespread use of scales to measure
attitudes. However, researchers often criticise the closed-ended approach for increasing the
P a g e | 2
burden of the respondent (identify their attitudes and relate them to a scale) and extracting
responses that might be of interest to the researcher and not to the respondent [20][23].
Open-ended responses encourage respondents to formulate and articulate their opinion about
the attitude of interest to the researcher. They are recommended when the researcher intends to
measure the respondents attitude towards a relatively new and complicated problem. They can
also measure attitudes towards a problem that may otherwise garner very little attention or
issues about which the people might not have thought extensively [13]. However, compared to
the closed-ended responses, it increases the burden on the respondents (respondents should
write/type responses) and the enumerators [17]. Furthermore, it demands dimensionality
reduction and the development of a coding scheme to reduce bias across analysts before the
analysis, which increases the burden further [24], which increases the time required for the
analysis of the open-ended responses significantly [18]. For these reasons, the large-scale use
of this approach for measuring attitudes has been limited.
Based on the inferences from the literature review presented, it can be concluded that the
difficulty in analysing the open-ended response is a significant factor that discouraged the
widespread use of the technique. However, recent advances in machine learning, specifically
in Natural Language Processing, such as Topic Modelling, now offer the possibility of
extracting the information from the open-ended responses. This research proposes to use Topic
Modelling to extract information and quantify them into numbers to measure attitudes.
1.2 OBJECTIVES AND SCOPE
This thesis pursues the following objectives: -
1. To analyse if the method of collecting qualitative data influences the survey responses
2. To develop an approach to extract open-ended responses from a survey and process the
data
3. To compare the relative performance of the open- and closed-ended responses in
analysing qualitative data
4. To develop a framework that measures attitudes while allowing respondents to choose
their preferred type of question (closed- or open-ended)
This thesis analyses the suitability of using Topic Modelling to extract information from the
open-ended responses to measure attitudes, and we use questionnaires that collect information
on the attitudes related to Autonomous Vehicle (AV) as a case study. The literature indicates
P a g e | 3
that there could be other benefits associated with the use of open-ended questions, such as
spontaneous and unbiased responses [18], [25]. In this research, we restrict the analysis to
evaluating the suitability of open-ended questions to measure attitudes, together with
methodologies to do so. In this regard, alternative versions of the questionnaires that consider
open- and/or closed-ended questions are presented randomly to respondents. To quantify the
relative benefits, we evaluate the relative performance of the alternative versions of the
questionnaire to measure attitudes. In this regard, statistical models estimated using each of
these independent datasets are evaluated based on their predictive capability. Besides, we
evaluated the responses to the attitudinal questions to analyse if the method of asking questions
influences the measured attitudes. Using the estimated coefficients, we developed a framework
to measures attitudes by allowing respondents to choose their preferred type of question.
1.3 OVERVIEW OF THE APPROACH
As mentioned in the previous section, one of the current research objectives is to evaluate if
using open-ended questions to measure attitudes and gain more insights into this problem; we
implemented two sequential surveys/studies. Our first study aimed to analyse the intention to
use Shared AVs using responses from India. We designed the questionnaire using the Theory
of Planned Behaviour (TPB). We used two versions of the questionnaire, a. only Likert scale
questions in the first version (Ver_LK), b. in the second version (Ver_LKOE), we used open-
ended questions, followed by the same Likert scale questions in Ver_LK. However, in this
study, the sample size and representativeness of the sample was a concern and to address these
objectives, we carried out a second study using a more extensive and representative sample
from the USA. We designed the questionnaire using an extended version of the Technology
Acceptance Model (TAM). Differently from the previous study, we used three versions of the
questionnaire, where a third version (Ver_OE) used only open-ended questions, along with the
two versions used in the study carried out in India.
To pursue the first objective, we compare the responses to the Likert scale responses common
to both versions of the questionnaire. We compared the responses to the twenty-one Likert
scale questions (in 1st study) and the twenty Likert scale questions in Ver_LK and Ver_LKOE
(in the 2nd study). We compare the frequency distributions, both qualitatively and using non-
parametric tests (to test if distributions are similar for both versions of the questionnaire).
Furthermore, ordered models are estimated for each of the Likert scale questions to identify
the influence of the questionnaire type.
P a g e | 4
To extract information from open-ended responses (second objective), we used Topic
Modelling approaches such as Latent Dirichlet Allocation (LDA) and supervised Latent
Dirichlet Allocation (sLDA). Information, in the form of topics (different themes discussed in
open-ended responses), was extracted from each of the open-ended response and the extracted
topics were checked to identify if they were distinct and meaningful. Besides, topics were
compared with the Likert scale questions to identify the coherence.
The third objective involved analysing the relative performance of the closed- and open-ended
responses. In the first case study, we estimated models separately for the two versions of the
questionnaire and compare their relative performance. Ver_LK used Likert scale responses,
and Ver_LKOE used Likert scale responses and the information extracted from the open-ended
responses using Topic Models to predict the intention to use Shared AVs. In the second study,
which analysed the intention to use AVs for commute trips, we compared the performance of
the models estimated for each of the three datasets. For Ver_LK and Ver_LKOE, we estimated
models using the responses to the Likert scale questions, and for Ver_OE, we estimated models
using the topics extracted from the open-ended responses. The models were estimated using a
more sophisticated behavioural framework using the Probabilistic Graphical Models.
We used the estimated coefficients of the models from Ver_LK, Ver_LKOE and Ver_OE for
the final objective. Using the coefficients and the attitudes for Ver_LK, we try to generate the
corresponding Likert scale responses for Ver_LKOE and the topic proportions for Ver_OE.
We achieved this using Gibbs Sampling and repeat this for Ver_LKOE and Ver_OE. This
framework allows respondents to use either of the two approaches (Likert scale or open-ended)
to answer the survey. The analysts could then use the data to estimate models using responses
from either of the questionnaires.
1.4 THESIS CONTRIBUTION AND CORRESPONDING STAKEHOLDERS
As mentioned previously, there are differences in opinions within the research community and
the practitioners on an appropriate method for measuring attitudes. The use of the open-ended
approach is considered more behaviourally correct and captures the actual attitudes of the
respondents as they do not capture respondents to statements that might be of importance to
the analyst. However, the difficulties posed with open-ended questions to respondents and
analysts have deterred the extensive use of this approach. It requires respondents to articulate
an appropriate response and could thus be time-consuming. It poses even more serious
P a g e | 5
difficulties for the analysts, as analysts must spend considerable time extracting and coding the
responses to make inferences.
By relying on the advances in Machine Learning, we simplify the extraction of open-ended
responses, as we demonstrate in this thesis. Furthermore, we also proposed a framework to
measure attitudes by allowing respondents to use a questionnaire type of their choice. Also, the
thesis explored the implications of the questionnaire type (open- or closed-ended) on the
responses to the attitudinal questions. Identifying the potential impact would facilitate analysts
to either use open-ended questions alone or in conjunction with the closed-ended questions to
measure attitudes more effectively.
To our knowledge, this is the first instance of the use of Topic Modelling approaches to extract
information from open-ended responses in travel behaviour research. Moreover, this thesis also
investigated if open-ended questions before a set of Likert scale questions influence responses
to these Likert scale questions. And the results indicate a significant reduction in respondents
choosing neutral responses and a corresponding increase in respondents choosing extreme
points. In addition to this, we proposed a framework that allows respondents to choose the
questionnaire type of their choice yet allow respondents to use the models of their choice to
predict behaviour.
We describe below the benefits to the different stakeholders from this thesis: -
Analysts - Firstly, they are relieved from the difficulties of having to code the open-
ended responses and the bias from coding. Secondly, this saves considerable time for
the extraction of the information from the open-ended responses. Furthermore, should
analysts be interested in not using Topic Models to extract information from open-
ended responses, they could still improve closed-ended questions by reducing neutral
responses by placing open-ended questions before closed-ended questions. Finally, if
the analyst intends to provide flexibility to respondents to choose the questionnaire type
of their choice, they could use our proposed framework to obtain coefficients for
prediction using the models of analysts’ choice.
Consultancies - consultancies can reduce the human resources required to extract and
process open-ended responses and reallocate them to other tasks. Furthermore, this
saves the overall cost of the project.
Policymakers - Since the analysts are not coding it manually, the extracted information
is more systematic and less subjected to subjectivity, making the inferences drawn from
P a g e | 6
the studies more realistic. Policymakers also benefit from a significant reduction in the
time to gain insights.
Respondents The proposed framework allows respondents to answer surveys using
the questionnaire of their choice.
1.5 SCIENTIFIC OUTCOMES FROM THIS THESIS
Journal Publications
1. Open-Ended Versus Closed-Ended Responses: A Comparison Study Using Topic
Modelling and Factor Analysis, published in the IEEE Transactions on Intelligent
Transportation Systems (2021).
2. A Closer Look into How Land-Use, Social Networks, and ICT Influence Location
Choice of Social Activities, published in the AESOP Transactions (2019)
Conference Presentations
1. Opening Up the Conversation: Topic Modelling for Automated Text Analysis in
Travel Surveys, presented at the 21st International Conference on Intelligent
Transportation Systems (ITSC), 4th 7th Nov 2018, Hawaii, USA.
2. Comparing Likert Scale and Open-ended Questions in the Application of TPB to
the Intention-to-Use of Autonomous Shuttle, presented at the 15th International
Conference on Travel Behaviour Research (IATBR), 15th 20th Jul 2018, Santa Barbara,
California, USA.
3. Do Open-ended Questions Influence the Measurement of Attitudes? An
Investigation, accepted for presentation at the 12th International Conference on
Transport Survey Methods, 20th 25th March 2022, Lisboa, Portugal
4. A Closer Look into How Land-use, Social Networks, and ICT Influence Location
Choice of Social Activities, presented at the 2017 AESOP Annual Congress, 11th to
14th Jul 2017, Lisboa, Portugal.
5. An Investigation into the Influence of Land-use, Social Networks, and ICT on
Location Choice of Social Activities, presented at the redeMOV 2nd Annual
Conference Urban Mobility Innovation for a Changing Society, 3rd to 9th May 2017,
Lisboa, Portugal.
P a g e | 7
1.6 ORGANISATION OF THE THESIS
The remaining chapters of this thesis are structured as follows: -
Chapter 2- a review of the literature that discusses the use of closed- and open-ended
questions to measure attitudes. In addition to this, we discuss the Frameworks to
Measure Attitudes, Natural Language Processing and its application and the modelling
techniques. Related to the design of the questionnaire, we review the literature related
to Autonomous Vehicles.
Chapter 3- presents the methodology and estimation results from the case study
carried out in India, which modelled the intention to use Shared AVs. Furthermore, we
outline the limitations of the current study.
Chapter 4 to address the Indian studys’ limitations, we launched a second study in
the USA. This Chapter presents the methodology and estimation results from the case
study carried out in the USA, which modelled the intention to use AVs for Commute
Mode. In addition to this, we also present a framework that allows respondents to
choose their preferred question type to answer questions related to measured attitudes.
Chapter 5- presents the salient findings of this thesis, limitations of the study and
directions for future research.
P a g e | 8
P a g e | 9
2 FUNDAMENTALS QUALITATIVE DATA AND ITS
MEASUREMENT
2.1 INTRODUCTION
This Chapter discusses the advances in the research related to measuring qualitative data,
precisely measuring attitudes. Neuman [26] emphasises the importance of three factors for
measuring qualitative data: a construct, a measure, and the ability to identify what the
researcher intends to measure, and this research focuses on the improvements related to the
second factor, “measure”.
The first section to follow discusses the need to measure attitudes in travel behaviour research,
where we present literature discussing the evolution of travel behaviour research. The recent
developments emphasise the importance of attitudes in travel behaviour research and the need
for its measurement, followed by a short discussion of the research that analysed attitudes in
travel behaviour. The literature review, although not exhaustive, highlights the importance of
measuring attitudes.
Section 2.3.1 presents a discussion on the methods used to measure qualitative data, precisely
the closed-ended responses. We discuss the history of the closed-ended approach and its use in
travel behaviour. Closed-ended scales can be of varying length - from dichotomous to very
large scales. The following sub-section presents the rationale for the scales, benefits and use in
measuring attitudes, which gives a better perspective of the closed-ended approach. Central to
the discussion of the length of the scales is whether the researcher/analyst intends to capture
neutral responses. Analysts interested in measuring neutral responses use scales with an odd
number of points, while others might rely on scales with an even number of points, and this is
indeed a source of intense debate among analysts and policymakers. Including neutral points
in the scales captures the response of individuals who may be neutral without having to
compromise. We present various approaches to extract information from closed-ended
responses. The final discussion on closed-ended responses is related to the common issues
related to using the closed-ended approach.
Section 2.3.2 discusses the use of open-ended responses for measuring attitudes. The objective
is to provide the various aspects of interest pertinent to the open-ended approach in a nutshell.
We begin this with a discussion on the potential benefits and the scenarios that warrant the use
of open-ended responses. The open-ended approach requires the coding of responses before
P a g e | 10
the analysis. Researchers argue that the position of the open-ended questions in the
questionnaire might influence the measured attitudes. Also, respondents may skip open-ended
questions without answering them, which is a significant concern. Consequently, the number
of missing responses is generally high for open-ended questions, also discussed in this Chapter.
Later, we briefly discuss the different approaches to extract information from open-ended
responses.
After making decisions on the appropriate approach to measure attitudes, researchers/analysts
should identify a framework that could explain the psychological processes involved in
forming these attitudes. In the subsequent section (Section 2.4), we present an overview of the
different socio-psychological theories used to measure attitudes in travel behaviour. Among
these, researchers have widely used the Theory of Planned Behaviour proposed by Ajzen [27]
and its extended versions. Other commonly used theories include the Technology Acceptance
Model (TAM) by Davis, Jr [28] and its extensions. Finally, researchers have used the Unified
Theory of Acceptance and Use of Technology (UTAUT) proposed by Venkatesh et al. [29]
and its enhanced versions to model the willingness/intention to use AVs. This section covers
the underlying principles, their applications and the salient findings related to their use.
After collecting the data, the next challenge involves estimating models to analyse travel
behaviour. Researchers have used models such as multiple linear regression, discrete choice
models and structural equation models. The next aspect of this chapters discussion (Section
2.5) is on the different statistical approaches adopted for modelling the intention/willingness to
use AVs. Furthermore, in Section 2.6, we discuss the factors influencing the intention to use
AVs. The final aspect of the discussion is related to the extraction of information from the
open-ended responses. In this thesis, we use Topic Modelling to extract this information.
Section 2.7 describes Topic Modelling and its application in research, particularly in travel
behaviour research.
2.2 MEASURING ATTITUDES IN TRANSPORTATION
Here, we provide a brief overview of the evolution of travel behaviour research that eventually
led to the measurement of attitudes, defined as dispositions towards overt action or as verbal
substitutes for overt action [10]. Initial attempts of transport modelling focussed on predicting
vehicular trips to make decisions regarding the necessary infrastructure. With the emphasis
shifting towards having more efficient systems, researchers started analysing the potential of
other modes of transport, which led to the inclusion of mode choice in travel demand models
P a g e | 11
and eventually to the development of a more behavioural framework viz., the Activity-based
approach. In parallel, the increased importance of sustainability encouraged
researchers/planners to assess the possibility of non-motorised modes of transport such as
walking and biking. Shifting travellers from one mode to another, particularly the sustainable
modes, often involves physical, regulatory, and pricing measures and involves analysing
subjective attitudes, which are qualitative. For example, the analysis of public transportation
often involves investigating attitudes related to comfort, convenience, etc. [1], [2], which paved
the way for research measuring attitudes in travel behaviour in mode choice, intention to use
sustainable modes of transport, AV and accident analysis, to name a few. These highlights the
importance of attitudes in travel behaviour and the need to improve its measurement.
We elaborate this further by referring to some of the research works in each of the fields
mentioned above. The constructs related to the Theory of Planned Behaviour, desires, past
behaviour, habitual use of the car and social stigma influence the shift towards public
transportation [30][33]. Research related to biking indicates that it is essential to understand
the attitudes towards biking to create a conducive environment for biking. Habits, familiarity
with the systems, pro-bicycle attitudes, the perception of ease-of-use and convenience
influence the intention to use bikes [3], [34][36]. And pro-bicycle attitudes of an individual
are often predicated on the individuals perception towards the environment, health and the use
of bikes [35], [37]. Another domain of interest to the researchers is understanding the attitudes
towards AVs, as it is quintessential to improve perceptions and, thereby, its potential use.
Attitudes, contextual acceptability and the enthusiasm for driving are influencers of the
intention to use AVs [38]. Individuals who are optimistic about driving, familiar with any form
of automation and conscious about the environment are more likely to use it [39][41].
On the other hand, the resistance to its use, anxiety and concerns towards the use of AVs impact
them negatively [39][43]. Furthermore, in accident analysis, personality traits linked strongly
to attitudes play a vital role. Analysing driver behaviours that cause accidents by understanding
the attitudes towards safety and risky driving behaviour could improve the prediction of traffic
accidents [44][46].
To summarise, the short review of the literature presented above highlights the importance of
attitudes in travel behaviour research. Making this information available to decision-makers
equips them with information allowing them to make better policy-related decisions.
Furthermore, understanding the underlying factors influencing the intention to use future
P a g e | 12
technology, policymakers, manufacturers, and decision-makers could address the concerns on
the proposed technique. However, the qualitative nature of the attitudes presents challenges in
its measurement. Researchers and analysts use closed- and open-ended questions to gain
insights into these attitudes, and we present a discussion on each of the two approaches in the
following section.
2.3 MEASUREMENT OF QUALITATIVE DATA
Having realised the importance of measuring attitudes in travel behaviour research, we now
focus on how it can be measured. Should researchers and policymakers use closed- or open-
ended questions? Closed-ended questions present statements such as “I am comfortable sharing
a ride with strangers” or “My friends and family will be supportive of me using a bicycle” to
respondents and individuals then choose points on a scale that best matches their attitudes. The
alternative approach, viz. open-ended approach, requires respondents to articulate their
response to a question in their own words. Interestingly, Renesis Likert, the proponent of the
five-point Likert scale, eventually recommended using open-ended questions to measure
attitudes [18]. This debate on which is a better approach for measuring attitudes intensified
with time, and the preference shifted in favour of using the closed-ended approach, and the
advent of internet-based surveys has only intensified this debate further. The evolution of the
two approaches and their applications, particularly in travel behaviour research, is described in
detail in this section.
2.3.1 Closed-ended Responses
As discussed earlier, researchers using the closed-ended approach present statements
describing the attitude and then respondents choose points on the scale that best describes their
attitude. Scales can be unipolar or bipolar. Bipolar scales are based on the notion that attitudes
are bipolar constructs ranging between the two extremes (negative to positive) with a neutral
midpoint, whereas unipolar scales measure the importance of an attitude to an individual, often
without a precise midpoint [47]. To measure attitudes, presenting some statements positively
and some others negatively are desirable. Moreover, it is advisable to use a balanced scale with
an equal number of points corresponding to the two extremes. Closed-ended questions are
simple, easy to administer and analyse [48].
Plant (1922) [11] proposed one of the first rating scales to measure attitudes, which simplified
the data collection, reduced the completion time and made analysis easy. Considering the
P a g e | 13
different aspects, in 1924, Symonds [50] recommended using the seven-point scale to measure
attitudes. Later, Likert [10] proposed the five-point scale ranging between the extremes of
attitudes in 1932, which he argued is less painstaking and eliminated the need for judges while
ensuring reliable results. It is worth noting that seven- and five-point scales were not the only
scales in use. Over time, numerous other researchers proposed scales of varying lengths.
Interestingly, to date, the optimal number of points on the scale remains a matter of intense
debate and one that requires careful consideration.
Scales should be designed carefully by the analyst/researcher, and before using a scale, they
should make decisions regarding the length, the inclusion of midpoint, labelling (verbal or
numeric), etc. The choice of words in the verbal labels or the numbers in the numeric labels
should be carefully decided [47]. Failing to provide an exhaustive list of the categories for a
closed-ended question might constrain respondents who might then give undue importance to
categories that might otherwise be insignificant [49]. It is essential to ensure that all aspects of
the attitude and appropriate coverage of all the levels increase the accuracy of predictions [50].
The widespread use of rating scales can be attributed to their ease of use by both respondents
and analysts since answers can be directly quantified [18], [51][53]. Also, the use of scales
facilitates rapid operation, as the time between implementation, possession of data, analysis
and the evaluation of outputs is minimal [18]. In the following sub-section, we present literature
on scales of varying lengths and the optimal number of points on a rating scale.
2.3.1.1 Types of Closed-ended Scales
The classification of scales based on the number of points on the rating scale often includes
simple dichotomous scales to the most complicated scales. Symonds [54] recommended the
seven-point scale, whereas Likert [10] recommended the five-point scale. Compared to the
dichotomous scales, longer scales allow respondents to express their attitudes more effectively
by capturing the direction of agreement/disagreement and the intensity to which the individual
agrees/disagrees [47], [55], [56]. However, including too many points on the scale might reduce
the clarity, consistency and discriminatory power while increasing the respondents burden and
errors [47], [57], [58]. Smaller scales inhibit the respondents expressive power, be restrictive,
and diminish the respondents validity and discriminatory power [57], [59]. Researchers should
decide the length of scales after accounting for the number of cognitive categories of interest
to the researcher [47]. Preston and Colman [59] recommended five-, seven- and ten-points
scales based on the easiness of use; however, researchers do not yet have a consensus on the
P a g e | 14
optimal number of points on a scale. Consequently, researchers use scales of varying sizes,
discussed in the following paragraphs.
Two-Point Scales- Two-point scales are simple and effective to capture the direction of the
decision and convenient to administer and score [24], [57]. They are the preferred choice for
surveys with questions when the alternatives are homogeneous [13], [57]. It is also preferable
to use them to measure attitudes from the general public or a culturally diverse sample with no
formal education [57], [60]. Two-point scales expects precise responses from the participants,
might hence be restrictive and do not capture the extremity of agreement/disagreement [24],
[48], [57]. Researchers have used two-point scales to measure the social desirability score,
willingness-to-pay for bus fare reform and the willingness-to-use automated cars [43], [61]
[64]. In the field of transportation, this has found application in analysing if the individual liked
using AVs, their interest in AVs and their approval or disapproval of AVs [65], [66].
Three-Point Scales- Three-point scales extend the dichotomous scale by allowing truly neutral
individuals to express their opinion without being forced to agree or disagree with the
statement. Researchers used the three-point scales to measure the involvements in accidents
and the attitudes towards risky driving behaviours [45], [67]. They have been used to
understand the concerns and benefits of AVs, assess the ability to multi-task while riding AVs
and the need for connectivity features in AVs [16], [66], [68].
Four-Point Scales- The four-point scales offer more flexibility, as it allows individuals to
express the degree to which they agree/disagree. However, they do not allow individuals to
voice their neutral opinions and are preferred when there is no interest in measuring neutral
opinion. Applications of four-point scales include measurement of the attitude towards public
transport, user satisfaction with the bus service and the propensity of driving a car [41], [69],
[70]. They are also used to measure the perceived benefits of AVs, assess the concerns,
intention to use AVs and the expectations and the anxieties towards AV [65], [71][75].
Five-Point Scales Proposed by Likert [10], the five-point scale- an extension of the four-point
scale with the inclusion of the neutral option, is widely used by researchers and analysts. Some
of the applications in the field of transportation involve assessing driving behaviour, driving
safety, driver skill inventory and risk-taking attitude [45], [76][79]. Researchers have also
used the five-point scales to measure perceptions and service quality, attitudes towards safe
driving, driver behaviour and the attitudes towards cars and public transit [80][83]. Other
applications include assessing the intention to use various services [public transportation over
P a g e | 15
private cars, bicycles, bike-sharing systems and the dependence on the car] and the frequency
of use of different modes [30], [32], [84]. In the past, researchers have used to measure attitudes
towards AVs, the environment, public transport and safety, technology [4], [40], [85].
Six-Point Scales- The six-point scales provide respondents with more flexibility to express
their attitude; they, however, cannot capture neutral opinions. Compared to the four-point scale,
six-point scales offer two additional levels for respondents to express their attitudes.
Researchers often argue that the six-point scale provides higher discrimination and reliability
[58], [86]. For example, Komorita [87] used a six-point scale to compute the neutral region on
a Likert scale without a neutral point. Other instances of the use of a six-point scale were to
analyse driver behaviour, the intention to use public transport and bikes and the frequency of
use of different modes [32], [35], [77], [83]. Other applications include the analysis of the
attitudes towards AVs, anxiety, benefits and concerns, mode choice frequency [41][43], [88].
Seven-Point Scales- The seven-point scales- a popular and widely used scale, is the extension
of the six-point scales with the inclusion of the neutral option, and many researchers consider
the optimal number of points on a scale as seven [54], [89][94]. Claims from researchers in
favour of the seven-point scales include better reliability, equal utilisation of all categories,
higher accuracy and easiness of use [54], [90], [93]. Miller [89] argued that using a seven-point
(± 2) scale enables respondents to process and make better judgements. Furthermore, having
more than seven points increases the cognitive burden of the respondents with almost no
improvement in reliability [94]. Researchers have used the seven-point scales to measure the
social desirability score, risky behaviour, perceived risk, transport priorities, attitudes towards
transport mode and the intention to use public transport/bicycle/AVs [31], [32], [34], [37], [38],
[61], [64], [95]. It has also been used to measure the attitudes towards AVs, constructs related
to TPB and TAM [43], [75], [96].
Longer Scales- Longer scales are predicated on the idea that they capture richer information
than scales with fewer points. The nine-point scale was recommended by Bendig [97],
considering the additional information it captures. Researchers used the ten-point scale to
measure driver stress inventory, attitudes towards behaviour and customer satisfaction with
public transport services [96], [98], [99] and an eleven-point scale was used to measure driver
stress, willingness to pay for autonomous vehicles [39], [76]. Preston and Colman [59]
analysed the reliability and validity of a 101-point scale to measure the service quality in
restaurants and stores.
P a g e | 16
2.3.1.2 Optimal Number of Points in a Scale
Too many cooks spoil the broth would be an appropriate phrase to describe the current
scenario, as there is still no consensus among researchers on what is the appropriate scale
(based on length) to be used. Champney and Marshall [56] advocated using longer scales (18-
24 points) unless established that five- and seven-point scales are appropriate for measuring
the attitude under consideration. They concluded that the number of points on a scale is a
function of measurement conditions and should be decided, taking reliability and validity into
consideration.
Reliability, an indicator of the effectiveness of the scale, can be assessed by evaluating the
consistency of the results. Longitudinal reliability is the consistency in response when a
question is repeated to the same individual over multiple instances. A low value for longitudinal
reliability could indicate the change in an individuals attitude or the unreliability of the
measure. On the other hand, cross-sectional reliability is the consistency in response to a series
of attitudinal questions asked to the same individual [47].
According to Bending [97], [100], reliability was not affected by an increase in the number of
scale points (three-, five-, seven- and nine-points) but declined as the number of points
increased to eleven [97]. Respondents reliability was constant for larger scales (five-, seven-
and nine-point), low for two-point and high for three-point scales [100]. Peabody [55]
concluded that long scales primarily reflect the direction of the response but are limited in
capturing the degree of extremeness and reliability. Furthermore, the dichotomous scales are
equally reliable compared to longer scales when the attitudes are homogeneous; however, for
heterogeneous items, the reliability increases with the length of the scale [57].
Another factor that positively influences reliability is verbal anchoring, particularly that of the
central category [97]. The conveyed information increases with the number of points on a scale
(three-, five-, seven-, nine- and eleven-point) and verbal anchoring [101]. Lissitz and Green
[102] illustrated this relationship using the Monte Carlo approach. However, for scales with
more than five points, the increase in reliability was nominal, thereby challenging the claim
that seven is the optimal number for a scale [102]. However, Preston and Colman [60]
concluded that the reliabilities were maximum for the seven- and ten-point scales. Komorita
and Graham [57] emphasised the need to include validity and reliability to choose the number
of points on a scale.
P a g e | 17
Validity is the accuracy with which the measure taps a construct of interest. Correlational
validity is the degree to which a given measure can predict other variables related to this given
variable, and discriminant validity measures the distinction between constructs expected to be
distinct from each other [47]. From their study, Preston and Colman [59] observed a positive
correlation between the number of points on the scale and validity but remained unchanged for
scales of length more than five.
However, Matell and Jacoby [103] concluded that the reliability and validity are independent
of the number of points on the scale. Also, reliability and validity are not the only factors that
decide the optimal number of points on a scale [103]. Matell and Jacoby [104] concluded that
the length of the scale did not influence the utilisation of the scale points. Should the
researcher/analysts intend to reduce response time, fatigue, warm-up, boredom, etc., shorter
scales should be used. Longer or even-numbered scales might help address the issues
associated with neutral responses [104]. The length of the scale should be decided after taking
reliability, validity, discriminating power and the respondent preferences into consideration
[59]. So, there is still no consensus on the appropriate length of a scale, and it is often the
researchers prerogative.
2.3.1.3 Neutrality and its implications
Respondents sometimes choose the middle option as they do not want to take a position [48].
Researchers and analysts circumvent this by using scales without a neutral option [48], [87],
[105]. Komorita [87] demonstrated the potential to compute a neutral point/region for a Likert
scale with an even number of scale points and aids the identification of genuinely neutral
responses. Sometimes, respondents choose points on the scale that may please the interviewer
or are socially acceptable, called acquiescence. The scales without the neutral point reduce the
social desirability bias and acquiescence [105]. However, critics still argue that using even
points on the scale fails to capture the responses of truly neutral individuals. They are instead
forced to agree/disagree with the statement [48], [104]. This issues still fails to reach a
consensus and is mostly the choice of the researcher/analyst.
2.3.1.4 Satisficing
Individuals tend to provide typecast responses, choosing one of the extremes, positive or
negative. Individuals are more likely to endorse statements (satisficing) than disagree, and the
lack of suitable alternatives in the rating scales may force individuals to choose inappropriate
P a g e | 18
levels. If the researcher is not careful enough to present a comprehensive questionnaire
covering all the different aspects related to the attitude to the respondent, this may cause gaps
in the attitudes captured [48].
2.3.1.5 Other Issues
In this sub-section, we discuss other concerns related to using closed-ended responses reported
in the literature. The closed-ended approach uses statements about attitudes observed to be
important in a study conducted on another sample or a different context. Therefore, closed-
ended responses may measure aspects of the problem relevant to the analyst or the researcher
and not the issues that may be of actual relevance to the individual [25], [50], [106].
Consequently, individuals may respond to statements without perceiving them correctly [107].
Another issue related to using the closed-ended approach, particularly in online surveys, is that
the order in which researchers present the options to the closed-ended responses may influence
responses. For instance, by evaluating the movements of the eye, Galesic et al. [108] concluded
that respondents spend more time on the options at the top and are more likely to choose them.
However, Mavletova [110] observed this primacy effect as higher for PCs, though Buskirk and
Andrus [110] found similar patterns for respondents answering both smartphones and PCs.
2.3.1.6 Analysis of Closed-ended Responses
When researchers aim at measuring the overarching relationships between measured variables
(closed-ended responses), approaches such as Principal Component Analysis (PCA) and
Exploratory Factor Analysis are used. Researchers use the Principal Component Analysis
(PCA) for the exploratory data analysis and dimensionality reduction of the closed-ended
response. Data is transformed into a new coordinate system with the scalar projection of the
data with the most significant variance in the first coordinate, the following most significant
variance in the second coordinate, etc. PCA has found profound application in research;
intention to use bike-sharing for holiday cycling [3], perceptions of autonomous vehicles [95],
analysing the importance of environmental considerations [109]- to name a few.
To perform exploratory analysis on the closed-ended responses in which they aim to uncover
the underlying structure of a large set of variables, researchers often use the Exploratory Factor
Analysis (EFA). To assess the school-wide cultural competence and analyse if policies,
programs and practices align, Nelson et al. [110] used Exploratory Factor Analysis to identify
factors that influence the construct. EFA has also found applications in the development of the
P a g e | 19
Simulation Experience Scale for nurses [111], Risk Management Practices [112], assess the
students competency for current skills and perception of importance for future skill [113].
Other applications include the identification of underlying constructs of existence, relatedness
and growth needs and the travel difficulties [114], drivers that influence travel decisions while
using travel apps [5] and the customers attitude and intention to use AVs [115].
When the researcher intends to test if a construct is consistent with the researchers perception
of the construct, Confirmatory Factor Analysis (CFA) is used. Unlike EFA, which is
exploratory, in CFA, researchers test if the data fits the hypothesised measurement model- often
derived from previous analytical research or theories. For example, Dubey et al. [116] used
CFA to explore the enablers of the Six Sigma implementation and their contextual
relationships. Other applications include understanding the competency levels among dentists
[113], adoption of a computer-based model to monitor parking revenue inflow [117], drivers
that influence travel decisions while using travel apps [5] and the development and validation
of the Ego Identity Process questionnaire [118].
2.3.2 Open-ended Responses
This section discusses the theory, benefits, issues, and uses associated with the open-ended
approach. Open-ended questions allow respondents to express attitudes or opinions in their
own words, and notably, Renesis Likert [18] argued in favour of the open-ended approach to
measure public attitudes. Likert [18] streamlined the data collection by focusing on collecting,
recording, and processing information. It is advantageous to use open-ended questions to
measure operative attitudinal properties such as ambivalence, inconsistency and embeddedness
[107]. Geer [15] argued that the closed-ended approach is preferred over open-ended questions
primarily due to convenience and not based on the inability to measure public attitudes.
Open-ended questions give researchers and policy-makers insights into responses that
individuals give spontaneously, which mitigates the bias associated with the use of closed-
ended responses [18], [119], [120]. Also, open-ended questions are not restrictive and help
identify the problem, thereby reducing the overall time for the analysis [106]. Compared to the
closed-ended approach, the open-ended approach provides more detailed information [53].
Schuman, Ludwig and Krosnick [49] demonstrated that the results from the open-ended
responses could match those from closed-ended responses if designed carefully.
P a g e | 20
Open-ended questions should be the preferred choice to measure the respondents knowledge
about a topic or the frequency of their undesirable behaviour [48]. The use of open-ended
questions eliminates the potential for bias arising from the enumerator making suggestions to
the respondents [25]. The open-ended questions allow individuals to name and communicate
relevant issues to the individual, which facilitates identifying their priorities more accurately
[107], [121]. Furthermore, open-ended questions could even be used in large samples to
identify aspects and wordings used in closed-ended questionnaires [52].
The proponents of the open-ended approach consider the closed-ended approach to be
incomplete, unnatural and rigid and likely to distort the respondents attitudes [18]. Compared
to the closed-ended approach, it might be expensive to use an open-ended approach, and so to
cope with the financial constraints, Likert [122] proposed using an open-ended approach with
reduced sample sizes. In his opinion, this is preferable to the use of methods that are less
accurate and biased. Esses and Maio [107] consider the open-ended approach convenient, as it
relieves researchers from the burden of having to test the statements extensively before using
them. In addition, open-ended questionnaires can easily be adapted across different cultures
and samples, while the statements used in the closed-ended approach might be sample- or
culture-specific [107], [123].
The use of open-ended questions to measure attitudes does have its challenges. For example,
Griffith et al. [124] argue that the use of open-ended questions might slow the individuals down
and force them to use the additional time to answer the question. Other issues include a.
attracting responses outside of the frame or importance to the researcher/analyst [50], [52],
[120], b. poses difficulties for the interviewers to collect data and for the analysts to analyse
and code the data [18], c. increases the respondents burden, enumerators and analysts [17],
d. lowers the reliability of the collected data [18].
Furthermore, the responses to open-ended questions are likely to be influenced by external
events and developments in the media [49]. Considering the difficulties and economic
feasibility, the advocates of the closed-ended responses consider open-ended questions to be
appropriate for pre-test and inappropriate for practical use [18]. Some researchers argue that
open-ended questions measure the ability of the respondent to articulate a response and not the
attitude [125]. Critics of the open-ended approach argue that their responses capture superficial
concerns; however, Geer [15] argued that the scholars overstated the claim that the open-ended
questions tap superficial concerns. In online surveys, the size of the text boxes has shown to
P a g e | 21
influence the responses since longer text boxes produce more words and ideas per respondents
[126][128]. Moreover, poorly stated instructions and the respondents interest could affect
response length and quality of information [128][130]. Ambiguity in responses increased item
non-response, and survey break-offs are some of the other issues associated with the use of
open-ended questions [19], [129], [131].
With the advent of the internet, researchers started conducting surveys over the web and the
size of the text box and the presence of motivating text influence the responses to open-ended
responses [132], and the non-responses to open-ended questions was a significant issue, also
in web-based surveys [129]. The higher break-off among respondents answering web-based
open-ended questionnaires was attributed to the absence of interviewers to motivate them.
Furthermore, lack of flexibility in completing web surveys at the convenience of the
respondents is also a deterrent to participation [133]. Given the increasing penetration of
smartphones, it is essential to analyse its implications on the responses to the open-ended
questions. The length of open-ended responses was relatively shorter among respondents
answering the survey using mobile phones [134], [135]. Responses from smartphones had
longer response time per character, were often less precise, and had more abbreviations [136].
Reacting to this, Revilla and Ochoa [136] expressed their concerns over the reliability of open-
ended responses collected using surveys not optimised for smartphones. It is interesting to
revisit this with surveys now increasingly optimised for smartphones and that individuals are
more used to typing using smartphones
Researchers use open-ended questions to collect suggestions on methods to improve the bus
service, opportunities and barriers for energy use and conservation [70], [137]. They have also
been used to understand the willingness-to-pay for AVs and feedbacks on AVs [38], [41], [85].
Hulse, Xie and Galea [95] coded the attitudes (captured using open-ended responses) towards
autonomous vehicles into five categories.
2.3.2.1 Coding of Open-ended Questions
Coding is a prerequisite to make statistical predictions with open-ended responses and allows
analysts to compute frequencies and percentages, perform non-parametric tests and statistical
analysis [95], [119], [123], [137]. The data should be coded carefully into categories before
transforming the same to a nominal scale [138], and the diversity in responses might pose
difficulty in coding [119]. Coding poses a challenge related to the hiring of coders and the
coding of data [53]. Depending on the categories to which the responses fall, researchers apply
P a g e | 22
weights and ensure the reliability, agreement and validity of the coded data [106], [139]. Data
is reliable if different observers or analysts agree to a specific coding of categories, and a coding
scheme is valid when it captures the truth about the measured attitude. Interestingly, reliability
in coding does not necessarily ensure the validity of coding because there could be a consensus
on the coding scheme among the different coders, but it might be far from the truth [140].
Niedomysl and Malmberg [141] found the inter-coder variability while coding open-ended
responses negligible, and the differences, if any, should be explored further to develop better
coder instructions.
2.3.2.2 Position of Open-ended Questions
The position of the open-ended questions can influence the responses. Placing them at the end
can alter the original ideas of the individual and cause higher attenuation [51]. Having open-
ended questions at the beginning of a questionnaire could facilitate identifying the suitable
categories for the closed-ended questions. And positioning them at the end of the survey helps
capture more information (impressions, experiences and other aspects) than merely their
correlations [17]. A more suitable approach is to place them only initially or place them at the
beginning and ending [51]. The sequence of data collection did not influence the responses to
a closed- and open-ended question [142]. The opinions seem to be quite divided, and
researchers should explore this further.
2.3.2.3 Missing Values in Open-ended Surveys
Compared to the closed-ended questionnaires, response rate (the rate of return) is lower for
open-ended questionnaires [52], [127], [143]. Administering the open-ended question to
individuals who cannot articulate their responses might invite non-response [120], [125]. Also,
administering open-ended surveys to individuals without or with shallow levels of formal
education might increase non-response [123]. Consequently, the proponents of the closed-
ended approach often argue that the open-ended questions measure the ability to articulate
responses and not the attitudes [120], [125]. However, Geer [14] attributed non-response to the
lack of interest in the topic, independent of articulating. In web-based surveys, the proportion
of missing values is often high due to interviewers absence who might encourage respondents
otherwise. The length of the text boxes might influence non-response; longer text boxes might
reduce non-response but may trigger invalid entries [119] and could be because individuals
might respond to closed-ended questionnaires without applying much thought [124].
P a g e | 23
2.3.2.4 Extraction of Information from Open-ended Responses
For extracting the information from open-ended responses, word-based analysis has been used
extensively in qualitative data analyses that expect the data structure to emerge or validate a
thematic content analysis [144], [145]. The analysis involves counting the co-occurrence of
word units, which eventually helps identify the cluster of concepts and the strength and
direction of their relationships. To understand the factors contributing to conflicts, Jehn [144]
collected data using open-ended interviews, and each term in the response was indexed in the
alphabetical order along with the frequency of occurrence and helped identify similarities but
less helpful in making conclusions about the context of the response [144]. To gain quantitative
insights from qualitative data, Schmidt [146] used the text analysis software “CATPAC” to
obtain frequencies of words and the association between words, used later to perform
multivariate analysis. Analysing the open-ended responses requires several coders, which
increases the cost and time for analysis and for this, crowdsourcing, which utilises the internet
to divide work between several participants, could be used. Crowdsourcing has massive
potential for extracting information from large volumes of text data and has the “human touch”
to the analysis- often absent in computerised analysis. Jacobson, Whyte and Azzam [147] and
Benoit et al. [148] utilised the potential of crowdsourcing to code, evaluate and quantify open-
ended responses, although the cost might still be a concern.
The content analysis takes the analysis a step further from exploring frequencies of words to
identifying themes or patterns [149]. Words, themes or concepts in an open-ended response
should be coded into manageable categories using the frequency of themes or patterns [150].
Jehn [144] performed content analysis, where each word in the response was associated with a
keyword (or its synonyms) - often derived from a theory. Kawashima and Kawano [151]
performed the six-step content analysis to extract information from open-ended responses, with
identification of overall trends, determination of meaning units, identification of the dominant
theme, the grouping of themes, naming the meanings of content categories and benchmarking
the results being the different steps [151], [152]. Using this, Kawashima and Kawano [151]
looked into the reasons for attempting suicides by interviewing the survivors, whereas Mamali
et al. [152] investigated the experience of living and coping with spouses who experienced
sensory loss. In addition to this, Mamali et al. [152] also performed sentiment analysis on some
open-ended responses. Kelly, McKnight and Schubotz [153] used thematic content analysis to
analyse the perceptions among Irish youth on community relations in Northern Ireland. To
analyse the physical and psychological needs of cancer survivors, Burg et al. [154] used
P a g e | 24
directed content analysis to code themes from the open-ended responses. To ensure consistency
in the independently coded responses, the researchers evaluated the coding strategies over
biweekly meetings [156], which indicates the intensive nature of the analysis. In more recent
work, Savic et al. [155] used thematic analysis to assess how nurses cope with shift work, in
which they extracted themes derived based on previously established theories and those that
emerge from the data, thereby providing a more balanced approach [155].
Mossholder et al. [156] used the Dictionary of Affect Loading (DAL) to process the open-
ended responses from a survey among the managers. Data processing involves spell-checking,
removal of apostrophes and trailing “s” while removing articles and conjunctions. DAL does
not take the contextual meaning into consideration or the influence of negative modifiers and
other words. The analysts used the processed words to generate response scores to evaluate the
responses [156]. While using automated text analysis to analyse the main sensory
characteristics of mayonnaises, one of the main challenges ten Kleij and Musters [157]
encountered was with the pre-processing of the text. They highlighted the need to consider
combinations of words to capture the respondentsintention. The use of automated text analysis
eliminates the need for analysts to identify topics before analysing open-ended responses, and
researchers/analysts generate topics.
Interestingly, the extracted topics are of actual relevance to the respondents and not meeting
the researchers expectations. Researchers can then re-evaluate the theoretical formulations and
restructure future studies accordingly [158]. For AVs, particularly for assessing trust in self-
driving vehicles, Lee and Kolodge [159] used structured Topic Models. To analyse the
sentiment in the open-ended feedback from students, Hynninen, Knutas and Hujala [160] used
sentiment analysis and highlighted the potential for automated text analysis.
2.3.3 Concluding Remarks
In the above sub-sections, we introduced the two approaches and discussed some of the
advantages and disadvantages of the two approaches. The closed-ended approach is used
widely for measuring attitudes, driven primarily by convenience, higher completion rate and
better execution time. Researchers can easily convert responses to the closed-ended responses
to numbers, but it may pose considerable stress to the participants as they go through a three-
stage process. First, individuals interpret the statement, then relate it to the measured attitude
and convert the agreement or disagreement to an appropriate point on the scale. Also, there are
some concerns associated with the use of the closed-ended approach to measuring attitudes.
P a g e | 25
These include a. what the optimal number of points on a scale is, b. should we include neutral
points on a scale, c. how do we ensure that the list of statements is comprehensive, d. concerns
regarding the use of statements in a study without adapting and validating them.
On the other hand, open-ended questions do not constrain individuals to voice their opinion in
a particular manner. In other words, researchers do not force responses; they could be closer to
reality and could be used in the exploratory studies (for relatively new topics) to identify
aspects that may be of fundamental importance to the respondents. By identifying these aspects,
we could design the statements for the closed-ended studies can be designed. Open-ended
responses provide in-depth information, but this does not necessarily translate into better
predictions [53]. Other challenges associated with the use of open-ended questions to measure
attitudes include a. the effort to articulate and write a response might deter some respondents
from answering open-ended responses, b. they might generate responses that may be outside
of the frame of the study, c. coding the open-ended responses is burdensome, d. the analysis is
time-consuming and prone to subjective bias, e. for a policymaker, there is a considerable time
delay between the implementation and the final output.
2.4 FRAMEWORKS TO MEASURE THE INTENTION TO USE
Having introduced closed- and open-ended approaches to measure attitudes in the previous
section, we focus on analysing the different psychological frameworks used to measure
attitudes. These frameworks capture the relationships between attitudes and behaviours within
human action. Numerous researchers have used the Theory of Planned Behaviour (TPB) to
measure intentions in travel behaviour analysis. TPB is an extension of the “Theory of
Reasoned Action (TRA)” and uses Attitude Towards the Behaviour (ATB), Subjective Norms
(SN) and Perceived Behavioural Control (PBC) to predict the intention to use [27], which is
used to predict human behaviour and is used extensively by researchers in different fields.
Figure 2.1 illustrates TPB’s framework proposed by Ajzen [27].
We now discuss the applications of TPB in travel behaviour research. Researchers have used
TPB to measure the intention to use public transport, psychological factors influencing the use
of personal vehicles or public transport, willingness to pay and the intention to use bike-sharing
for holiday cycling [3], [6], [31], [33], [37]. TPB has found applications in measuring social
identity, frequency of using the car, bicycle, public transport and walking [34], [84]. Buckley,
Kaye and Pradhan [97] used TPB for a simulation driving experiment of Level 3 AV to predict
the intention to use AVs and concluded the emphasised the need to include trust. Other studies
P a g e | 26
that used TPB to predict the intention to use AVs include the works of Moták et al. [161], Koul
and Eydgahi [162], Chen and Yan [163] and Jing et al. [164]. Except for the work by Moták et
al. [161], all others used extensions of TPB with Koul and Eydgahi [162] using technophobia
and perceived safety, Chen and Yan [163] using technological-savviness and Jing et al. [164]
using technophobia and perceived safety. It is worth noting that all three works reported above
are independent and separately extend the constructs of TPB.
Figure 2.1 The Framework for the Theory of Planned Behaviour Source- [27]
Technology Acceptance Model (TAM) also evolved from TRA and postulated that the
Perceived Usefulness (PU) and Perceived Ease of Use (PEoU) influence the behavioural
intention to use [28] (the framework for the TAM in Figure 2.2). TAM has found many
applications in the context of predicting the intention to use AVs [96], [165], [166]. As was
observed for the TPB, researchers have used many extensions of TAM also. The predictive
capability improved with the introduction of constructs such as trust [96], [166][168] and an
external locus of control [167]. Choi and Ji [169] hypothesised that system transparency,
technical competence, and situation management capabilities to influence trust and trust
influenced the behavioural intention in their modified framework. In the research by Zhang et
al. [168], PU and Perceived Safety Risk influenced trust, which later influenced the acceptance
of AVs.
Panagiotopoulos and Dimitrakopoulos [169] used an extension of TAM that used perceived
trust and social influence to predict the behavioural intention to use of AVs. Wu et al. [170]
used TAM to analyse the effect of environmental concern on the public acceptance of
Autonomous Electric Vehicles and concluded that the Green Perceived Usefulness (GPU) and
P a g e | 27
PEoU are essential determinants. The constructs of TAM explained 40% of the variance in the
intention to use Autonomous Shuttles, as reported by other researchers [161]. Another variation
of the TAM, the Car Technology Acceptance Model (CTAM), which used effort expectancy,
performance expectancy, social influence, perceived safety, anxiety, attitudes about the
technology, desire for control, technology use and technology acceptance, has found
applications in predicting the intention to use AVs using respondents [73], [74], [171].
Figure 2.2 The Framework for the Technology Acceptance Model [28]
The third theory, the Unified Theory of Acceptance and Use of Technology (UTAUT), is a
social-psychological theory proposed by Venkatesh et al. [29] to predict the acceptance of the
technology. UTAUT assumes that performance expectancy, effort expectancy and social
influence impact behavioural intention. Behavioural intention and the facilitating conditions
influence user behaviour, while age, gender, and experience are moderating factors [29]. We
present the framework in Figure 2.3, and to gain insight into the underlying principle, readers
may refer to Venkatesh et al. [29].
Various researchers used UTAUT to evaluate the intention to use AVs [165], [172]. In the
study by Leicht, Chtourou and Ben Youssef [172], the individuals technological savviness
moderated the adoption and purchase of AVs. Madigan et al. [173] also relied on UTAUT to
analyse the decisions that influence automated public transport use. Their analysis showed a
strong positive influence of performance expectancy, social influence, facilitating, and hedonic
conditions on behavioural intention. Effort expectancy does not influence the behavioural
intention or moderating variables such as age, gender or experience [173]. When comparing
TAM, TPB and UTAUT to predict the intention to use AVs, the performance was best for
P a g e | 28
TAM, followed by TPB [165]. However, Moták et al. [161] obtained comparable results for
TAM and TPB to predict the intention to use Autonomous Shuttle.
Figure 2.3 The Framework for the Unified Theory of Acceptance and Use of Technology
(UTAUT) [29]
2.5 MODELLING APPROACHES TO PREDICT AV USE
To set a context for the analysis, we aim at measuring the intention to use Autonomous
Vehicles. So, in this section of the chapter, we summarise the different statistical modelling
techniques used to predict the intention to use/willingness to pay for AVs.
Researchers have used Multiple Linear Regression to model the relationships between a
response variable and more than two explanatory variables by fitting a linear equation to the
observed data. The estimation involves computing this best-fitting linear equation by
minimising the sum of squared errors- the vertical distance of each data point from the line for
all the observations. Researchers have used this to model the intention to use AV [162], [169],
[171]. Another approach, Hierarchical Linear Regression, involves statistically controlling the
effects of specific variables by adding them as blocks to analyse if adding these variables the
predictive capability of the models and the moderating effects of a variable. In principle, they
are a complex form of the ordinary least squares models [174]. To predict the intention to use,
while accounting for moderating effects, many researchers have used Hierarchical Linear
Regression [38], [96], [165], [173].
To assess the complex relationships simultaneously, researchers often use Structural Equation
Modelling (SEM). SEM is a collection of statistical techniques that captures the relationship
between independent variables and one or more dependent variables. The independent and
Performance Expectancy
Effort Expectancy
Social Influence
Facilitating Conditions
Gender
Age
Experience
Voluntariness of Use
Behavioural Intention
Use Behaviour
P a g e | 29
dependent variables can be continuous or discrete or even factors or measured variables in this
approach. Commonly included estimation techniques include Maximum Likelihood,
Generalised Least Squares. Readers may refer to Ullman and Bentler for detailed SEM and
other estimation techniques [175]. Researchers have used techniques such as Partial Least
Squares [PLS] [167], PLS-SEM [163], [166] and SEM [164], [168], [170]. Chen and Yan [165]
used a PLS-Multi-Group Analysis to account for the heterogeneity in the choices.
Researchers have used discrete choice models in the context of AVs. For modelling discrete
choice variables that are nominal, researchers have used Multinomial Logit models. The model
assumes that each alternative has a utility, and the models are estimated using the utility
maximisation technique and assuming a Gumbel distribution for the error components. The
Mixed Logit model addresses some of the issues associated with the Multinomial Logit model
by allowing for random taste variation, unrestricted substitution patterns and correlation
between the alternatives and is estimated through simulation [176]. For example, Daziano,
Sarrias and Leard [176] used a mixed logit model with a normal distribution for the key
parameters and log-normal distribution for other parameters to estimate the willingness to pay
for AVs. A random utility model (RUM) with a logit kernel was used to model the choice
between conventional vehicles, shared and private AVs by Haboucha, Ishaq and Shiftan [40].
2.6 FACTORS INFLUENCING INTENTION TO USE/PAY” FOR AUTONOMOUS
VEHICLES
In this study, we performed case studies on the intention to use Autonomous Vehicles (AVs).
Literature review indicates that the socio-demographic characteristics of the individual,
attitudes towards AVs, land-use characteristics and current travel characteristics of the
individuals are likely to influence the intention to use/pay for AVs. In the paragraphs to follow,
we discuss each of these factors in greater detail. We believe that the variable effects reported
by various researchers discussed in the subsequent sections are reasonable.
Socio-demographic Characteristics - Studies indicate gender influences the intention to
use/share/buy AVs [177], [178]. Higher levels of concerns about the use of AVs and scepticism
among women might have contributed to fewer women willing to use AVs [71]. Research
indicates that younger individuals have fewer concerns regarding AVs and are willing to use
AVs [68], [74], [178], and older individuals are more likely to use conventional cars [66], [68].
The educational qualification of the individual play an important role in the choice of AVs. Be
it personal or shared AVs, educated individuals are more likely to use AVs, which could be
P a g e | 30
related to their increased awareness [40], [68], [85]. As disposable income determines the
spending potential of an individual, it is unsurprising to observe the positive correlation
between income and the willingness to pay for AVs [39], [68].
Vehicle ownership and the type of vehicle is another influencer of the choice of AVs.
Individuals not owning a car or using cars with advanced automated features are equally likely
to use AVs [41], [85]. Furthermore, the intention to use shared AVs was high for individuals
from households with fewer vehicles [179]. Individuals with a passion for driving are more
likely to prefer conventional vehicles and less likely to use AVs [42], [73], [172]. Since AVs
are likely to allow mobility for individuals aged or suffer from physical disabilities, it is not
surprising to observe a higher intention among such individuals [74], [180]. Finally, individuals
familiar with the car-sharing companies are willing to pay less for AVs and instead use Shared
AVs [66], [179].
Attitudes towards AVs and Their Influence - The attitudes towards the use of AVs
significantly influence the intention to use and the willingness to pay for it. A positive attitude
towards AVs is often predicated on an increased awareness of AVs [75]. Higher scores for
perceived usefulness, perceived ease of use, trust and perceived safety of AVs are associated
with an increased intention to use AVs [166], [167]. Many believe that AVs improves traffic
safety significantly by reducing accidents, and this is likely to motivate more individuals to use
AVs [68], [73], [169]. The improvements to transportation efficiency, reduction in fuel
consumption and emissions, elimination of parking woes and the ability to multi-task might
condition the intention to use AVs [4], [42], [72], [180]. The ability to multi-task [42], [72],
the potential to discuss with fellow passengers and enjoy the views while travelling excites
them about AVs [16]. Concerns regarding the use of AVs among respondents include the lack
of manual control [72], [85], failure of the system [72], [73], [180], the legal liabilities for the
drivers or owners, particularly in the event of an accident [39], [72], [180]. Anxieties about
sharing and the monitoring of data [39], [169], performance expectancy and the trust in AVs
[167], [169], [181], perceived usefulness, perceived safety risk [170], perceived ease of use
[73] are other factors affecting the intention to use AVs. Apart from these, the environmental
consciousness and the technological-savviness of the individuals could also impact decisions
[40], [179]. To model the attitudes measured using closed-ended questions, researchers have
used SEM [163], [164], [170], PLS [166], [167], [178], dummy indicator variable [68], EFA
[4], CFA [38], [40], [168], [170], [172], [181] and PCA [65], [88], [172]- to name a few. Hulse,
P a g e | 31
Xie and Galea [95] coded the attitudes (captured using open-ended responses) towards
autonomous vehicles into five categories.
Travel Characteristics the current travel patterns of the individuals, could play a decisive
role in their choice of AVs. For instance, the vehicle miles travelled [39], [182], the frequency
of driving [39], [41], past experiences with crashes [66], [68], joint/solo travel [66], need to
perform errands during the day [40] and the distance to workplace [66] might alter the decision.
Land-use Characteristics - People living in built-up areas or congested streets do not favour
using fully autonomous vehicles [16], [38]. Compared to respondents from urban areas, those
from rural areas have more concerns regarding the use of AVs [41]. Furthermore, the
willingness to pay for AVs is higher among higher-income neighbourhoods and lower among
individuals in more job-dense neighbourhoods [66]. Finally, those in downtown and suburban
areas are less likely to prefer conventional gasoline vehicles [68].
2.7 NATURAL LANGUAGE PROCESSING AND ITS APPLICATIONS
Researchers use closed- and open-ended questions to measure attitudes, and the open-ended
questions are preferred to gain insights on the respondents attitude towards a relatively new
and complicated problem [13]. Considering this, we decided to undertake research related to
Autonomous Vehicles (AVs). To extract information from the open-ended responses in our
survey, we used techniques in Natural Language Processing (NLP). We now briefly discuss
some methods (pertinent to the topic) in NLP before describing LDA in detail.
Term Frequency*Inverse Document Frequency (TF*IDF) indicates the relative importance of
a word in a document to the corpus. Term Frequency of a word is the frequency of occurrence
of the given word in a document. However, Inverse Document Frequency has an important
role: to diminish the weight of words that occur very frequently in the document and increase
the weight of words that occur rarely- done to account for words that occur very commonly in
most documents [183]. However, researchers often criticise these models for not having a solid
mathematical framework while being computationally expensive [183], [184].
Another approach that could be adopted is Sentiment Analysis, which aims to identify, extract,
and classify responses based on polarity. Sentiments can be binary (positive and negative) or
n-point scales (strongly disagree, disagree, neutral, agree, strongly agree). Sentiment Analysis
is often helpful in understanding the extent of acceptance of service and eventually improve
P a g e | 32
these services. However, one of the issues associated with the approach is that it merely
captures the sentiments without providing deeper insights [185].
Researchers also use Word Embeddings, aka. Word Representation to measure similarities
between words by capturing the semantic and syntactic information of words. They are used to
build continuous word vectors after accounting for their contexts. Consequently, we place
words that mean the same closer in the vector space. [186]. Researchers approach Word
Embeddings in two ways, a. words expressed as vectors of co-occurring words, b. words
expressed as vectors of the linguistic contexts in which they occur [187].
The fourth approach, namely, Topic Models, are statistical models to discover latent “topics”
in a collection of documents (corpus) by assuming that words that are specific to a given topic
are more likely to occur in documents that discuss this topic. Commonly used approaches
include LDA [184], supervised LDA [188]. Researchers often argue that these approaches are
often suited for analysing large corpora and not for short text [189]. In practice, this assigns to
each text a series of numerical indicators (e.g., topic proportions in the Latent Dirichlet
Allocation algorithm) later used in modelling. LDA follows a conceptually similar idea to PCA
or Factor Analysis, where we extract fundamental (latent) variables or vectors from the data.
Before the use of Topic Models, the data from the open-ended responses should be cleaned or
pre-processed. We begin this by splitting the responses into words/token, removing numbers,
punctuations, and other symbols. Having done this, some of the most common words in the
language, such as “a”, “able”, “about” (to name a few) that do not convey any specific
meaning/relevance- often termed as “Stop Words” are removed. In addition to this, we
combined words that would otherwise have different meanings when assessed independently.
For example, “public” and “transportation”, which provides a different meaning independently,
but has an entirely different meaning when combined, “public transportation” is identified
using “Regular Expressions” and replaced.
Furthermore, we reduce different words in the open-ended responses from the exact words to
their root form. For example, all words- “game”, “gaming”, “gamed”, “games” is reduced to
the root form “game” [190], [191]. Data pre-processing is critical in text analysis and improves
the accuracy of the analysis [192], [193].
We used this pre-processed data to analyse data and outline the NLP technique, viz., Topic
Modelling and its applications. NLP empowers computers to understand text and speech as
P a g e | 33
humans. In this thesis, we used Topic Modelling, a generative statistical model- that allows a
set of observations to be explained by latent groups based on the similarities in some parts of
the data. We find Latent Dirichlet Allocation (LDA) and Supervised Latent Dirichlet
Allocation (sLDA) appropriate to extract information compared to the other text analysis
methods.
Before explaining sLDA, we explain LDA, a popular method in Topic Modelling that aims to
identify latent constructs in text data. Having such latent variables, we transform each of the
open-ended response into numerical values to be used for prediction. LDA is similar to a
multinomial principal component analysis (PCA) as LDA converts a text document
(represented by word frequencies) into a linear combination of topics (represented by word
frequencies). The linear combination of topics is conceptually comparable to the eigenvectors
in PCA. In LDA, the given set of documents is represented in the Bag of Words (BoW) format,
a vector of word frequencies. The Bag of Words is a multiset of its words that disregards
grammar and words order while keeping the frequency. Using BoW and K topics (number of
topics extracted), LDA extracts a set of K topics that minimise the reconstruction error of the
original documents, and each of the extracted topics is a BoW. When each of the future
unlabelled documents is projected on the topic space, they are re-represented as a combination
of K topics. In addition to the number of topics, LDA requires two additional parameters, α
and η, that determine the sparsity of document-topic and response variable distribution priors,
respectively [184], [194]. In sLDA, the documents and the responses are modelled jointly to
obtain the latent topics that best predict the response variable [188]. In other words, it is a
supervised method, where the resulting topics are the ones that maximize the accuracy of a
particular model and is accomplished by modelling (maximum-likelihood estimation) jointly
the documents and the response variable to find latent topics that best predict the response
variable for future unlabelled documents. However, using the dependent variable for the final
model (when Topic Model results are used to predict some other choice) as the response
variable for sLDA might cause endogeneity issues. For the dataset from India (Chapter 3), the
response variable was the intention to use Shared AVs and for the dataset from the USA
(Chapter 4), the response variable was whether the respondent agreed with the open-ended
question.
We present the Probabilistic Graphical Model for sLDA in Figure 2.4 and referring to the
figure; the dataset comprises of “D” documents comprising of “N” words each Wd, n (n = 1 …
N and d = 1 … D). We assign a topic among K available topics for every word, and each topic
P a g e | 34
k (k = 1, , K) consists of a vector βk that contains the words and associated frequencies in
this topic.
Figure 2.4 Probabilistic Graphical Model for sLDA
The result is that for a given corpus of documents and a response variable, a set of topics that
span across every document can be obtained, which acts as their “common building block”.
Furthermore, for every document, a set of “K” numbers indicating “how much” each document
belongs to the building blocks can be obtained. We use them for the estimation of the models.
We assess the topics for their meaningfulness, along with the inter-topic distance between the
extracted topics. An investigation into the overlap using the visualisation tool pyLDAvis [195]
provides insights on whether the topics are distinct.
Topic Modelling opens the possibilities for the extraction of information from text. This
technique has been used previously in the prediction of nonhabitual overcrowding of public
transport, taxi demand based on the information on the special events on the internet, travel
route recommendations using geotagged photos and discover trip patterns such as destination,
time of arrival, day of the week and stay duration using data from transit smart card records
[196][200]. Hasan and Ukkusuri [201] used Topic Models to extract information from social
media platforms to obtain multi-day activity patterns of individuals.
Since one of the objectives of our study was to identify if researchers could use Topic
Modelling to extract information from the open-ended responses, we investigated if it has been
used previously in survey analysis. To extract information from open-ended responses from
the American National Election Study (ANES), Roberts et al. [158] used structured Topic
Models and observed that they could recover relationships similar to those by hand coders.
Researchers have used Topic Modelling to extract information from the open-ended responses
in market research, which reduced the analysis time and human bias. However, the accuracy
N K
α θd Zd, n Wd, n βk
D
Yd η, σ2
P a g e | 35
of predictions was affected by the frequency of topics and the number of topics that could
adversely affect the topics' quality [202]. Tvinnereim and Fløttum [203] and Mitsui, Kubo, and
Shoji [204] used Topic Models to extract information from the open-ended survey questions
on climate change and protected area assessment. We could not find the application of Topic
Models to extract information from open-ended responses in the context of travel behaviour
research. If found beneficial, policymakers and analysts in travel behaviour research could use
these techniques to extract information from the open-ended responses.
2.8 SUMMARY
This chapter presents a literature review on different aspects related to the measurement of
qualitative data. The idea is not to present an exhaustive review but to identify the research
questions that need further investigation. We first discussed the need for measuring attitudes
in transportation analysis. As is the case with the measurement of attitudes in other fields, most
researchers have used the closed-ended approach in transportation analysis. Furthermore, a
discussion on the alternative approaches for measuring attitudes, closed- and open-ended, is
presented. In this regard, the advantages and disadvantages associated with using each of these
approaches are discussed. We then carry out a discussion on concerns with the use of each of
these approaches. Later, the literature review focused on the frameworks used to measure
attitudes in travel behaviour studies and the use of Topic Models. We finally discuss the
different modelling techniques used by other researchers.
P a g e | 36
P a g e | 37
3 TOPIC MODELLING FOR OPEN-ENDED RESPONSES A
CASE STUDY ON THE INTENTION TO USE SHARED AVS
3.1 INTRODUCTION
In this chapter, we address the first three research objectives of this thesis, viz., a. to analyse if
the method for collecting qualitative data influences the survey responses, b. to develop an
approach to extract and process open-ended responses from a survey, c. to compare the
relative performance of the open- and closed-ended responses in analysing qualitative data.
To investigate these in travel behaviour research, we design and deploy questionnaires that
measure the intention to use Shared AVs in India. We use Shared AVs because we believe
presenting a questionnaire about a relatively new topic might generate interest among
respondents to answer the open-ended surveys, as recommended by Rugg and Cantril [13]. We
design the questionnaire using the Theory of Planned Behaviour distributed in India between
November 2017 and March 2018. We published this work in the IEEE Transactions on
Intelligent Transportation Systems, and we adapted the text in different sections from the paper
[205].
We organise the rest of this chapter as follows; Section 3.2 discusses the questionnaire design,
presenting the experimental design and the theoretical framework used in the study. Section
3.3 discusses the data collection and the data cleaning procedures, and Section 3.4 discusses
results from the exploratory analysis of the closed-ended responses. We further explore the
differences in the frequency distributions of the responses to the closed-ended responses and
their statistical significance. We present the different aspects of the extraction of information
from closed- and open-ended responses in Section 3.5 and include a discussion on the results
from exploratory analysis and Topic Modelling and comparing the extracted topics with the
statements used for the closed-ended responses. A brief discussion of the adopted modelling
framework is carried out in Section 3.6; the results from the estimation of the model for the
intention to use shared AVs in Section 3.7, and the final Section (0) presents the salient findings
and the limitations of the current study.
P a g e | 38
3.2 QUESTIONNAIRE DESIGN
3.2.1 Experimental Design
This study investigates the intention to use Shared AVs. We collected information on the
various aspects related to attitudes. These include technological-savviness and environmental
consciousness of the individual, perceptions towards AVs, attitudes towards the use of AVs,
subjective norms, and perceived behavioural control. Since this study aimed at identifying the
potential to use open-ended questions to measure attitudes in travel behaviour research, we
used two versions of the questionnaire. The alternative versions were presented randomly to
the respondents using SurveyMonkey’s “Page Randomisation” feature.
Ver_Lk presents respondents with statements depicting the attitude, who must then choose
points on a five-point Likert scale that best describes their attitude, and Ver_OE uses a
combination of open- and closed-ended questions. We used open-ended questions to collect
information on technological savviness, environmental consciousness, and AVs benefits and
negative impacts on society. All other attitudes are measured using the same set of Likert scale
questions used in Ver_Lk. Comparing the distributions of the responses to the Likert scale
responses between the two questionnaires facilitates the analysis of the influence of the
questionnaire type. Performing Topic Modelling using open-ended questions to extract
information from open-ended responses is related to the second objective. Furthermore, having
these two datasets allowed us to estimate models that facilitated the evaluation of open-ended
questions to measure attitudes. We present the experimental design in Figure 3.1.
3.2.2 Framework Design
To design the questionnaire, we used the Theory of Planned Behaviour (TPB) [27], adopted
widely in travel behaviour research [3], [6]. TPB posits that attitudes, subjective norms and
perceived behavioural control shape an individuals behavioural intention and, eventually,
behaviour. Figure 3.2 illustrates TPBs framework used in this research.
To measure technological savviness and environmental consciousness, we used Likert scales
questions in Ver_Lk and a combination of Likert scales and open-ended questions in Ver_OE.
To measure the impacts of AVs on society, we used Likert scales in Ver_Lk and open-ended
questions in Ver_OE. Finally, to measure the attitudes towards the behaviour, subjective norms
and perceived behavioural control variables, we used Likert scale questions for both versions
of the questionnaire and its draft is presented in Appendix A.
P a g e | 39
Figure 3.1 The Experimental Design for the Intention to Use Shared AVs
Figure 3.2 Modified Framework of the Theory of Planned Behaviour
3.3 DATA COLLECTION AND DATA CLEANING
We distributed the survey in India between November 2017 and March 2018 using Facebook
and WhatsApp with the help of bloggers. For the collected data, we checked for inconsistencies
and incomplete records. We also removed records of individuals not answering the intention
to use Shared AVs. Later, we excluded individuals taking more than an hour to complete the
survey, as we suspect such responses to lack coherence. To deal with respondents answering
Technological Savviness
Environmental Consciousness
Benefits/impacts to society
Attitudes towards behaviour
Subjective norms
Perceived behavioural control
Technological
Savviness
Environmental
Consciousness
Benefits/impacts to society
Attitudes towards behaviour
Subjective norms
Perceived behavioural control
Closed-ended questions
Closed- and open-
ended questions
Open-ended questions
P a g e | 40
the questionnaire too quickly (a.k.a. speeders), using the information from Qualtrics [206], we
computed the minimum time for a respondent to answer the different questions in our survey.
We classified individuals answering faster than this minimum stipulated time as speeders.
Furthermore, the responses were analysed to identify if respondents answered in patterns
(straight-line and diagonal responses). Speeders who also answered in straight lines in at least
two or more sections were excluded from further analysis.
3.4 EXPLORATORY ANALYSIS
In this section, we focus on the first objective of this research, to analyse if the method of
collecting qualitative data influences the survey responses. An online survey in which
questionnaires are presented randomly to respondents and participation was voluntary; we
believe it was quintessential to investigate if the samples were comparable. To facilitate this
comparison, we first analyse if the collected samples are almost equal in terms of the socio-
demographic characteristics of the individuals. Then, having established that the samples are
almost similar, we pursue analysis related to the first objective. In this regard, we first analyse
the frequency distributions before carrying out an in-depth analysis of the statistical
significance of the differences and the impact of the questionnaire type.
3.4.1 Preliminary Analysis
Four hundred and thirty-five respondents (Ver_Lk- 239, Ver_OE- 196) completed the survey.
After removing speeders and those answering in patterns (straight lines or other patterns), the
final dataset comprised 364 responses (Ver_Lk- 201, Ver_OE- 163). In terms of the survey's
duration, on average, respondents answering Ver_Lk took 11 minutes and 43 seconds (std. dev-
7 minutes and 42 seconds) while those answering Ver_OE took 15 minutes and 49 seconds
(std. dev- 8 minutes and 39 seconds). Table 3.1 provides information on the socio-demographic
and travel characteristics of the respondents.
The dataset had higher participation from male respondents (~75%). Additionally, a higher
proportion of students (~35%) and professionals answered the survey. In terms of monthly
income, nearly 50% of the respondents earned between 25,000 and 89,999 (INR). However,
individuals are still hesitant to respond to questions related to income, as almost a quarter of
our respondents were hesitant to report their income, while merely 1% had any issues revealing
information on the occupation. A significantly high percentage used public transportation and
P a g e | 41
car for commuting to the workplace/university. Furthermore, when asked about sharing rides,
~35% of respondents shared rides more than once a week.
Table 3.1 Socio-economic and Travel Characteristics
Variables
Ver_Lk
Ver_OE
Gender (%)
Female (%)
24.50
27.16
Male (%)
75.50
72.84
Average age (in years)
30.39
29.29
Occupation (%)
Student (%)
21.89
27.61
Postgraduate student (%)
12.44
9.20
Academic Faculty (%)
9.95
7.98
Manager (%)
9.45
11.04
Professional (%)
35.32
35.58
Technician and associate professional (%)
3.98
1.23
Others (%)
2.99
4.91
No occupation (e.g. retired, unemployed) (%)
1.00
1.23
Prefer not to answer (%)
2.99
1.23
Monthly Income
(in INR)
0-9,999 (%)
12.94
15.95
10,000-24,999 (%)
4.98
8.59
25,000-49,999 (%)
10.95
11.04
50,000-74,999 (%)
26.87
16.56
75,000-89,999 (%)
12.44
18.40
More than 90,000 (%)
7.96
2.45
Prefer not to answer (%)
23.88
26.99
The mode used for
Commuting to
Workplace /
University (%)
Walk only (%)
39.80
46.43
Bike (%)
33.83
31.63
Public transportation (%)
64.18
74.49
Car (%)
69.65
64.80
Motorbike (%)
33.33
35.71
Intermediate public transportation (%)
45.27
46.94
Travel time between home and university/school/workplace (minutes)
32.00
51.01
Frequency of ride-
sharing (%)
Daily (%)
21.39
16.56
2-3 times a week (%)
17.41
19.63
2-3 times a month (%)
17.41
17.79
Rarely (%)
11.44
3.07
Never (%)
32.34
42.94
In the last 2 years, have you been involved in a road accident? (%)
13.93
15.34
Participation in our survey mainly was voluntary, and hence, we could not ensure the
representativeness of the dataset. However, referring to Table 3.1, there is almost no difference
in the distribution of these characteristics between the dataset versions. Further analysis into
this using the Mann-Whitney U test [207] indicated that the differences were not statistically
significant. Therefore, since there is almost no difference in the socio-demographic
characteristics of the respondents, the responses from the two samples are comparable.
P a g e | 42
3.4.1.1 Attitudes Towards AVs
We captured the attitudes towards the use of AVs using seven statements related to attitudes,
and respondents, in general, had positive attitudes towards the use of AVs. Almost two-thirds
(65%) of participants perceived it “cool to use AVs”, probably due to the perception that AVs
are technologically advanced. Furthermore, an overwhelming proportion of respondents
appreciated the possibility of involving in other activities during travel. This percentage was
further higher for “Strongly Agree” among those answering Ver_OE of the questionnaire. We
observe a similar trend when asked if they believe AVs might alleviate the stress of driving.
Figure 3.3 Frequency Distribution for Attitudes towards Use of AVs
Almost 3/4th of respondents answering the questionnaire expected AVs to eliminate parking-
related issues. However, having to plan travel is a concern among many respondents, and this
P a g e | 43
question also attracted a few additional neutral responses (~40%) for both versions of the
questionnaire. Most respondents were neutral to the idea of having to share a vehicle with
others. The shape of the distribution was similar to that of a bell-shaped distribution. Besides,
nearly half of the respondents opined that the use of AVs might “kill the pleasure of driving”.
Figure 3.3 presents the frequency distribution for the attitudes towards AVs.
3.4.1.2 Subjective Norms
This section of the chapter investigates the respondent’s notion about the perceptions of friends
and family on AVs and their use.
Figure 3.4 Frequency Distribution for Subjective Norms
More than half of those (55%) participating in the survey believed that their friends/family
would use AVs. It is worth noting that, for most respondents, their peers encourage using public
P a g e | 44
transport, which might eventually constitute a motivation to use shared systems. Moreover,
their peers also believe that AVs might reduce congestion and pollution while also making
travel safer. Above all, for nearly 70% of respondents, their peers are positive about them using
AVs. We can observe a higher proportion of respondents agreeing to the statements among
those answering Ver_OE of the questionnaire (refer to Figure 3.4).
3.4.1.3 Perceived Behavioural Control Variables
The third element of TPB indicates how supportive the system is for the individual to exhibit
this behaviour, and in Figure 3.5, we present the frequency distribution of the responses.
Figure 3.5 Frequency Distribution for Perceived Behavioural Control Variables
We explored if the perception on whether the challenges faced by individuals to use AVs
remains unaddressed. About 60% of respondents believed that they might take the time to learn
P a g e | 45
how to use AVs. Most respondents were neutral to whether they believed the system would be
protected against hacking and failures. There was almost no difference in the distribution
between respondents answering the two different versions of the questionnaire. However, when
asked if they have concerns regarding the security of payment systems or the liabilities
following an accident, respondents answering Ver_OE seemed more worried. More
respondents answering Ver_OE were confident that the interactions of AVs with other vehicles
would be safe. About 70% of respondents (for both versions of the questionnaire) believed that
AVs might make travel more efficient. There was a drop in neutral responses among those
answering Ver_OE for questions related to the environmental friendliness of travel by AVs and
their affordability.
3.4.1.4 Intention to Use Shared AVs
Figure 3.6 Frequency Distribution for the Intention to Use Shared AVs
Referring to Figure 3.6, we can observe that an overwhelming majority of respondents (~70%)
favoured using shared AVs. Even though we focus on shared AVs, the results resonate with
the conclusions presented by Schoettle and Sivak [72] that the respondents from India and
China are more positive towards the use of AV in general. It is also worth noting that a similar
observation regarding the use of pooled AVs was also made by Stoiber et al. [208]. The number
of neutral responses was fewer in the dataset “Ver_OE”, and the number of respondents
choosing the extremes (“Strongly Disagree”/ “Strongly Agree”) is higher for Ver_OE of the
questionnaire. There is also a drop in the number of respondents who “Agree” to the statement.
In other words, the use of open-ended questions might have made respondents slightly more
decisive.
0.00
10.00
20.00
30.00
40.00
50.00
60.00
Strongly
Disagree Disagree Neutral Agree Strongly
Agree
Intention to Use Shared AVs
Ver_Lk Ver_OE
P a g e | 46
3.4.2 Statistical Analysis
We observed slightly more positive attitudes among respondents answering Ver_OE of the
questionnaire for most of the statements, and it was necessary to analyse if these differences
were statistically significant. In this regard, the non-parametric test, the Mann-Whitney U test,
was performed [207]. Contrary to the observations made by Baburajan et al. [209], the
differences were statistically significant only for the statement on the perceptions of friends
and family about reductions in accidents.
3.5 EXTRACTION OF DATA
We begin this section with a discussion on the treatment of closed-ended responses; however,
in the subsequent subsections, we carry out discussions pertinent to the second objective, “to
develop an approach to extract open-ended responses from a survey and process the data.
3.5.1 Treatment of Closed-ended Responses
To begin with, we tested the internal reliability of the Likert scale responses using Cronbach’s
Alpha values. The estimated values for “Ver_Lk” and “Ver_OE” of the questionnaire were
0.831 and 0.830, respectively, ensuring the reliability of the questionnaire. Next, to test the
validity of the questionnaire, we compared the average scores for the Likert scale responses
among two groups (does not follow news about AVs v/s follow news about AVs). Respondents
following news about AVs seem to have responded correctly to the statements. The differences
between the two groups were statistically significant (t-stats) for both versions (“Ver_Lk”-
14.65 and “Ver_OE”- 9.49) of the questionnaire, indicating the validity of the questionnaire.
Factor analysis was performed on the attitudinal questions presented on a five-point Likert
scale. For consistency, we estimated the same number of factors for both datasets. Referring to
the results presented in Table 3.2, Kaiser-Meyer-Olkin (KMO) statistics for the factors are
reasonable. Therefore, these factors are used to estimate the model for predicting the intention
to use shared AVs.
For the question related to the potential benefits of AVs to society, two factors, “Positive
benefits of AVs on society 1” and “Positive benefits of AVs on society 2”, were extracted.
“Positive benefits of AVs on society 1” encompasses the benefits in terms of making travel
more environmentally friendly, less polluting, and safe- by reducing accidents. “Positive
benefits of AVs on society 2” represents benefits such as reducing gender equity issues in travel
and making travel easier for people who cannot otherwise drive. Reduction in congestion and
P a g e | 47
accidents are other benefits accounted for by both factors. The influence of AVs on
employment is captured by “AVs impact on employment”.
Table 3.2 Results of Factor Analysis
Factors
Ver_Lk
Ver_OE
Positive Benefits of AVs on Society 1
KMO
0.793
Autonomous Vehicles will impact society by
Making travel more environmentally friendly
0.856
Reducing traffic congestion in cities
0.609
Reducing transportation induced pollution
0.868
Making travel safer by reducing accidents
0.527
Reducing the need for parking spaces
0.566
Positive Benefits of AVs on Society 2
KMO
0.793
Autonomous Vehicles will impact society by
Reducing traffic congestion in cities
0.485
Making travel safer by reducing accidents
0.588
Making travel easier for people who cannot otherwise drive
0.769
Reducing gender equity issues in travel
0.729
AVs Impact on employment
KMO
0.500
Autonomous Vehicles will impact society by
Causing unemployment of existing drivers
0.732
Creating new jobs for skilled workers
-0.732
Positive attitudes of individuals on AVs- PAT
KMO
0.727
0.717
I think it will be cool to use AVs
0.753
0.799
I can involve in other activities during travel
0.811
0.787
I will be relieved from the stress of driving
0.820
0.779
I can eliminate the parking-related issues
0.722
0.661
Subjective Norms- SN
KMO
0.727
0.717
I think my friends and family
Will use AVs
0.672
0.735
Believe, AVs will reduce congestion
0.695
0.795
Believe, AVs will reduce pollution
0.681
0.791
Believe, AVs will make travel safer, by reducing accidents
0.801
0.740
Will be positive about me using AVs
0.803
0.846
Perceived behavioural control variables 1- PB1
KMO
0.686
0.653
I am confident the system will be protected against hacking and failures
0.776
0.737
I am confident that the interaction with other vehicles will be safe
0.802
0.818
I believe this will make my travel more efficient
0.799
0.711
I believe AVs will make travel more environmentally friendly
0.706
0.704
Perceived behavioural control variables 2- PB2
KMO
0.686
0.653
P a g e | 48
I am worried about the liabilities after an accident
0.684
0.466
I have concerns regarding payment for the service
0.790
0.858
I think AVs will not be affordable to me
0.706
0.598
The factor “positive attitude of the individual on AVs” includes the notion of using AV as
being “cool”. In addition to this, this factor captures benefits from eliminating parking-related
issues, driving stress and the ability to perform other activities during travel. The perception
that friends and family use AVs and optimistic about the individual using AV is captured by
the factor Subjective norms”. In addition to this, it captures the belief that AVs reduces
congestion, pollution, and accidents. The confidence that the system is conducive for use is
captured by “Perceived behavioural control variables 1”, whereas the concerns that may remain
unaddressed is represented by “Perceived behavioural control variables 2”. “Perceived
behavioural control variables 1” includes the belief that the system is protected against failures
and hacking and that the interaction with other vehicles is safe. It also covers the confidence
that the use of AVs ensures that travel is more environmentally friendly and efficient. The
concerns about the payment system, liabilities in an accident and affordability are captured by
“Perceived behavioural control variables 2”.
3.5.2 Extraction of Information from Open-ended Responses
3.5.2.1 Exploratory Analysis
Four open-ended questions were used, and the responses were cleaned by removing all
punctuations and Stop Words- some of the most frequently used words that do not convey
any specific meaning in this context. Additionally, each of the words was reduced to their root
form using “Stemming”. The words so obtained are used for analysis. Table 3.3 presents the
average number of words used by each respondent in the initial response and the cleaned data
to answer each open-ended question.
Table 3.3 Average Number of Words per Response
Open-ended Questions
Original
Cleaned
For my travel needs, I use my smartphone for (OE1)
7.99
5.34
I think transport is a major cause of the environmental problem, because (OE2)
9.26
6.12
The society will benefit from the use of Autonomous Vehicles, as it will (OE3)
9.18
6.24
Autonomous Vehicles are likely to impact society negatively, as (OE4)
11.07
6.85
We explored the distribution of these cleaned words. Hereafter, we refer to the words after
stemming, and spellings might be different. For the first OE1, “book” (12.10%), “map”
(7.66%), “ticket” (6.95%), “googl” (3.95%) and “find” (3.71%) were the five most frequently
P a g e | 49
used words. To answer OE2, “pollut” (10.93%), “emiss” (4.45%), “vehicle” (3.64%), “air”
(2.53%) and “fuel” (2.13%) were used. When asked about the benefits of AVs to society,
respondents used “reduc” (9.76%), “accid” (4.73%), “pollut” (3.12%), “drive” (2.52%) and
“less” (2.41%). Respondents used job” (3.74%), “driver” (2.71%), “vehicl” (2.43%), “may”
(2.25%), and “loss” (2.15%) when asked about the negative impacts of AVs to the society. We
present the word clouds based on the cleaned responses in Figure 3.7.
Figure 3.7 Words Clouds for OE1, OE2, OE3, OE4
3.5.2.2 Results from the Topic Models
We extracted the responses from the open-ended using LDA and sLDA. We do not see an
improvement with sLDA over LDA, which could be because of the significant overlap between
the topics extracted from LDA and sLDA for our dataset. In the discussions presented below,
we use the prefix “To_L” for topics extracted using LDA and “To_S” for topics extracted using
sLDA. For example, the first topic extracted for OE1 is labelled To_L11 and To_S11 for LDA
P a g e | 50
and sLDA, respectively. Table 3.4 presents the top 5 words of each topic for the open-ended
questions.
Table 3.4 Top 5 Words for Each Topic for Open-ended Questions
Word_1
Word_2
Word_3
Word_4
Word_5
OE1- For my travel needs, I use my smartphone for
To_L11
map
googl
find
rout
locat
To_S11
map
googl
find
place
restaur
To_L12
travel
train
check
time
statu
To_S12
travel
locat
rout
train
check
To_L13
book
ticket
map
hotel
navig
To_S13
book
ticket
navig
map
hotel
OE2- I think transport is a major cause of the environmental problem, because
To_L21
emiss
gase
carbon
vehicl
road
To_S21
emiss
gase
carbon
vehicl
road
To_L22
pollut
air
sound
traffic
nois
To_S22
pollut
air
traffic
nois
sound
To_L23
vehicl
fuel
transport
exhaust
public
To_S23
vehicl
fuel
transport
public
increas
OE3- The society will benefit from the use of Autonomous Vehicles, as it will
To_L31
time
save
traffic
product
resourc
To_S31
time
traffic
save
product
safe
To_L32
drive
transport
travel
help
peopl
To_S32
drive
transport
travel
help
peopl
To_L33
reduc
accid
pollut
less
human
To_S33
reduc
accid
pollut
less
human
To_L34
effici
road
increas
driver
safeti
To_S34
effici
road
increas
driver
safeti
OE4- Autonomous Vehicles are likely to impact society negatively, as
To_L41
vehicl
may
accid
lead
hack
To_S41
vehicl
may
accid
lead
increas
To_L42
job
driver
loss
peopl
drive
To_S42
job
driver
loss
peopl
drive
To_L43
human
car
control
traffic
reduc
To_S43
car
control
system
technolog
human
Words in both LDA and sLDA for the same topic in the same order
Words in both LDA and sLDA for the same topic in a different order
Using the responses to OE1 from users, we extracted three different topics (using smartphones
for travel-related needs). The first was related to finding places of interest and the navigation
to the identified place (To_L11/To_S11). Another smartphone use was for travel planning and
finding the status (location and traffic updates) of transport modes (To_L12/To_S12). Lastly,
the third use (To_L13/To_S13) was related to finding hotels and making reservations, flight
tickets and taxi services.
P a g e | 51
The different perspectives of individuals related to the environmental impact of transportation
can broadly be classified into three. The first topic (To_L21/To_S21) was related to pollution
(air and noise) and the contribution of transportation induced pollution to global warming and
was driven primarily by the increasing reliance on personal vehicles due to the lack of public
transportation. Second, To_L22/To_S22 also discussed the role of air and noise pollution but
emphasised the wastage of natural resources. Finally, To_L23/To_S23 was related to
increasing air pollution and dependence on fossil fuels.
We categorise the potential benefits of AVs to society into four. First, respondents considered
AVs futuristic and discussed savings on travel time and resources (To_L31/To_S31). Second,
individuals opined that the use of AVs would probably improve public transport and eliminate
the stress of driving (To_L32/To_S32). Third, AVs may reduce accidents due to human errors
while minimising pollution (To_L33/To_S33). Finally, increased road efficiency, safety and
reduction in fuel usage was the fourth benefit (To_L34/To_S34).
There are three significant concerns about the use of AVs, shared by the respondents. First,
individuals believe that AVs may increase accidents as it is prone to system errors and software
hacking (To_L41/To_S41). Furthermore, individuals share concerns over employment loss,
particularly drivers (To_L42/To_S42). Finally, the third topic (To_L43/To_S43) discussed the
technological needs for such a control system and its associated safety.
To analyse if the extracted topics are distinct, we computed the inter-topic distance. The results
are presented visually using pyLDAvis in Figure 3.8. There is no overlap between the extracted
topics, which has the positive side-effect of reducing multicollinearity.
P a g e | 52
Figure 3.8 Inter-topic Distance for a. OE1, b. OE2, c. OE3, d. OE4
3.5.3 Comparison of Closed- and Open-ended Responses
This section analysed if we could find a correspondence between the open- and closed-ended
responses. In three out of the four open-ended questions used in this research, we found
similarities between the extracted topics and the statements used for the Likert scales. For the
questions related to transportation induced environmental problems, the topics extracted from
open-ended questions did not identify the need for colossal infrastructure but identified other
aspects used in Likert scale questions. Infrastructure improvements may be considered
necessary for mobility, which could probably be why people do not have identified it as an
environmental issue. However, open-ended responses also discussed the link between
3
2
3
3
3
1
4
2
1
2
1
1
2
P a g e | 53
transportation and global warming. The responses mostly covered increased traffic efficiency,
fuel efficiency, travel time savings, reduced pollution, and accidents related to AVs' potential
benefits. The respondents did not discuss the potential of AVs to reduce demand for parking
spaces or address gender equity issues related to mobility. Respondents answering the open-
ended question related to the potential impacts of AVs on society did cover some of the genuine
concerns related to its implementation, such as loss of employment opportunities, the potential
for system failures and hacking and the need for a costly and sophisticated control system.
The open-ended question on the use of smartphones for travel needs “OE1” illustrates the need
for a careful design of open-ended questions. The closed-ended question focussed on the
frequency of use of smartphones for various travel needs; however, our open-ended question
was targeted at identifying the uses. As a result, there is no correspondence and hence cannot
be used to estimate the models. However, we would like to emphasize that we could extract
meaningful topics from the responses to this question.
3.6 MODELLING FRAMEWORK
In the current research, we asked respondents to express their intention to use shared AVs on a
five-point Likert scale. Considering the ordinal nature of the dependent variable, we used an
Ordered Probit model for the estimation, which assumes that there is a continuous underlying
distribution that determines the ratings made by the respondent. In the estimation, we model
latent unobserved dependent variable (󰇜 using the independent variables (xi).
󰆒
By regressing on with the independent variables, thresholds for the unobserved dependent
variable for the various levels of the observed dependent variable () are estimated. The latent
variable() is related to the observed dependent variable () through: -





P a g e | 54
For a detailed description of the underlying principles and the estimation procedure, we
encourage readers to refer to Greene and Hensher [210]. We used 90% of the data for the
estimation (training data) and 10% for testing (test data). Having estimated the models, we
undertake a careful analysis of the direction, magnitude, and statistical significance of the
estimated coefficients. In addition to this, to evaluate the performance of the estimated model,
we relied on the values of log-likelihood, McFadden pseudo R2 2), adjusted McFadden
pseudo R2 (adj. ρ2), count R2 and adjusted count R2.
3.7 ESTIMATION RESULTS AND DISCUSSION
The discussion in this section pertains to the third objective, to compare the relative
performance of the open-ended and closed-ended responses in analysing qualitative data.For
both versions of the questionnaire, we estimate a model “MIusing variables related to the
Theory of Planned Behaviour only. Later, for each version of the questionnaire, we included
variables related to the environmental effects of transportation. Ver_Lk introduces the variable
“Role_Pol” and the corresponding model being named “MF”. For Ver_OE, we estimated two
models by including the topics related to the environmental effects of transportation induced
pollution- “MLDA(using LDA) and “MSLDA(using sLDA). Notice that we only have one topic
variable, as only this variable was statistically significant, and we discuss this in detail in the
subsequent paragraph. Finally, we compare the relative performance of each of these models
using a training set (90%) and a test set (10%).
Table 3.5 presents the estimation results for the training set for the intention to use shared AVs.
We present the threshold parameters for the Ordered Probit model along with initial log-
likelihood (LLI), final log-likelihood (LLF) and McFadden pseudo-R-squared value 2). MI
had a ρ2 value of 0.16, and there was no improvement with the addition of the environmental
pollution-related variable (MF) for Ver_Lk. However, the performance of the model was
superior for Ver_OE of the questionnaire. MI for Ver_OE had a ρ2 value of 0.227, which
improved to 0.242 and 0.245 for MLDA and MSLDA, respectively. A similar improvement can be
observed for other goodness-of-fit measures also. We should note that while computing the
adjusted values for ρ2 and Count R2, we did not account for the LDA and sLDA model
parameters. Questions related to the pollution asked in Ver_Lk of the questionnaire did not
improve the model's performance and could be because of the minimal variability in response
to the Likert scale question. However, in Ver_OE, there was an improvement with the addition
of the topic variable related to pollution, and this could be because this variable
P a g e | 55
(To_L23/To_S23) extracted more relevant information than the Likert scale question pollution.
It also captured the underlying factor- increase in vehicles, lack of public transport.
Table 3.5 Estimation Results of the Model for the “Intention to Use Shared AVs”
Variable
Ver_Lk
Ver_OE
MI
MF
MI
MLDA
MSLDA
Constant
2.568***
2.615***
2.683***
2.612***
2.608***
Positive attitude towards AVs
0.167
0.170
0.446***
0.472***
0.477***
Subjective norms
0.369***
0.374***
0.273***
0.217
0.209
Perceived behavioural control variables 1
0.388***
0.388***
0.381***
0.432***
0.440***
Perceived behavioural control variables 2
0.087
0.087
-0.182*
-0.212**
-0.217**
Role of pollution
-0.065
Role of pollution To_L23/ To_S23
1.799**
1.979***
Threshold parameters for the index
Mu(01)
0.873***
0.872***
0.833***
0.865***
0.872***
Mu(02)
1.906***
1.903***
1.839***
1.900***
1.915***
Mu(03)
3.771***
3.769***
3.773***
3.872***
3.896***
Goodness-of-fit measures
Sample size
183
145
LLI
-222.955
-181.119
LLF
-187.196
-187.143
-140.063
-137.331
-136.823
ρ2
0.160
0.161
0.227
0.242
0.245
Adj. ρ2
0.125
0.120
0.183
0.192
0.195
Count R2
0.552
0.552
0.579
0.600
0.607
Adj. Count R2
0.172
0.172
0.218
0.247
0.250
F1 Score
0.389
0.389
0.539
0.497
0.534
Note. ***, **, * ==> Significance at 1%, 5%, 10% level
For both versions of the questionnaire, the intention to use shared AVs is associated positively
with the individuals positive attitudes on AVs. The coefficients are, however, statistically
significant only for Ver_OE of the questionnaire. The positive perception among friends and
family and their supportive nature also contributes positively to the intention to use shared
AVs. The coefficient for subjective norms loses its statistical significance for MLDA and MSLDA
in the 90% training set. However, the variable is statistically significant for the entire dataset
(100% observations), and hence, we suspect this likely loss of statistical significance in the
training set (for MLDA and MSLDA) to be due to the low sample size (145 observations). The
factor related to the confidence in the system has a positive and statistically significant
coefficient for both versions of the questionnaire. It emphasises the need to improve the
confidence that the system is protected against hacking and failures and that the interaction
with other vehicles is safe. The factor capturing the concerns associated with the use of AVs
such as affordability, liabilities in the event of an accident and payment for the service
P a g e | 56
influences the intention to use shared AVs in Ver_OE negatively. The factor has, however, a
positive and statistically insignificant coefficient for Ver_Lk of the questionnaire. When
comparing the coefficients, the factor associated with the positive benefits of AVs and the
concerns with AVs is not statistically significant for Ver_Lk, but it is significant for Ver_OE.
The only difference between the two versions is the introduction of open-ended questions
related to AVs potential benefits and issues. Answering these questions would have demanded
additional thinking from the respondents and could explain the difference in the coefficients.
Moreover, this might have eliminated the careless response to Likert scale questions.
As reported earlier, we see an improvement in the perceptions towards AVs and their use
among respondents answering Ver_OE of the questionnaire. It could be because answering the
open-ended questions could have made them provide answers based on more deliberative
reasoning. Respondents who believe that transportation is a significant source of environmental
pollution are more likely to use shared AVs, probably because of the notion that shared systems
are sustainable. However, we do not observe this effect for Ver_Lk of the questionnaire, in
which we used the Likert scale question. We also tested AVs benefits on society (using a
factor for Ver_Lk and topics for Ver_OE) on the intention to use shared AVs. The variables
were not statistically significant, probably because respondents are more focused on the
potential benefits to the individual and not to society.
To test the predictive capability of the estimated models, we applied the models to predict the
intention to use shared AVs using a test set with 10% randomly selected observations. We
tabulate the performance of the goodness-of-fit measures for the test sets in Table 3.6.
Table 3.6 Comparison of the Goodness-of-fit Measures for the Test Sets
Ver_Lk
Ver_OE
MI
MF
MI
MLDA
MsLDA
LLI
-28.970
-28.97
LLF
-21.001
-20.883
-19.151
-19.245
-19.271
ρ2
0.275
0.279
0.339
0.336
0.335
Adj ρ2
-0.001
-0.032
0.063
0.025
0.024
Count R2
0.500
0.500
0.786
0.786
0.786
Adj Count R2
0.100
0.100
0.250
0.250
0.250
F1 Score
0.244
0.244
0.369
0.273
0.369
The performance of the models is better for Ver_OE in comparison to Ver_Lk. A similar trend
can be observed for the test set also. When computing the adjusted ρ2, we obtained negative
P a g e | 57
values for the test set for Ver_Lk and is because, while accounting for the number of estimated
parameters, the final log-likelihood was less than the initial log-likelihood. To account for
precision and recall, we computed the F1 scores. These scores are also superior for Ver_OE of
the questionnaire for both the training set and test set. The supervised approach (sLDA) did not
improve the model's performance compared to the unsupervised approach (LDA).
3.8 CONCLUSION
To extract information from the open-ended responses, we used LDA and sLDA. The inter-
topic distance between the extracted topics indicated that the extracted topics were distinct. We
could find a correspondence to a certain extent with the Likert scale questions for most open-
ended questions. The inclusion of open-ended response related to transport-induced pollution
had a positive influence on the intention to use shared AVs, which also had a corresponding
influence on the goodness-of-fit of the estimated models. The attitudinal variables in Ver_Lk
corresponding to this open-ended question did not turn out to be statistically significant. The
models estimated using Ver_OE of the questionnaire outperformed the models estimated using
Ver_Lk of the questionnaire for both the training set and the test set. We could not see an
improvement in the models performance with the use of sLDA over LDA. These results and
the potential to alter the reasoning process emphasize using open-ended responses to measure
attitudes and Topic Models for extracting open-ended responses.
An individuals positive attitude towards using AVs and subjective norms positively influences
the intention to use shared AVs for Ver_OE of the questionnaire. The coefficient was
statistically significant for the entire dataset; however, the coefficient lost its statistical
significance for the training set comprising 90% responses. And this, we believe, could be
related to the low sample size. Perceived behavioural control variable related to having a
conducive environment positively influenced the intention to use shared AVs, whereas
perceived behavioural control variable related to concerns negatively influenced the intention
to use shared AVs. This effect is observed only for Ver_OE of the questionnaire. The results
indicate the ability of TPB to measure the intention to use shared AVs.
Having demonstrated the potential to use open-ended questions to measure attitudes, we
consider it advisable to investigate this further using a larger dataset. We believe the use of a
larger dataset might improve the quality of the extracted topics. Being an online survey, where
participation was primarily voluntary, we could not ensure the samples representativeness.
Our data had a higher proportion of males, who were primarily young or middle-aged.
P a g e | 58
Considering this, further analysis using a sample that is representative of the population is
imperative. Furthermore, it would be appropriate to use statements in the Likert scale questions
that tested extensively to compare the performance with open-ended questions. Regarding the
use of Topic Models, a possibility could be to avoid the splitting of noun-noun compounds
(public transportation, global warming) in the data.
P a g e | 59
4 INTEGRATING AND COMPARING OPEN- AND CLOSED-
ENDED RESPONSES: A CASE STUDY ON AVS FOR COMMUTE TRIPS
4.1 INTRODUCTION
To address some of the limitations of the study carried out in India, we developed a second
study that analysed the mode choice for commute trips in the USA, in a scenario offering
Regular Cars”, “Personal Autonomous Vehicles”, and “Shared Autonomous Vehicles”. We
aim to validate the three objectives explored previously in Chapter 3 using a larger and
representative dataset. In addition to this, we also pursue the fourth objective, which involves
developing the framework that measures attitudes by allowing respondents to choose their
preferred question type.
We organised this Chapter similar to Chapter 3; Section 4.2 discusses the questionnaire design,
presenting the experimental design and the theoretical framework used in the study. Section
4.3 discusses the data collection and the data cleaning procedures, and Section 4.4 discusses
results from the exploratory analysis of the closed-ended responses. We further explore the
differences in the frequency distributions of the responses to the closed-ended responses and
their statistical significance. Section 4.5 presents different aspects of extracting information
from closed- and open-ended responses, discussing the results from exploratory analysis and
Topic Modelling. Finally, we present a comparison of the extracted topics with the statements
used for the closed-ended responses. Next, we carry out a brief discussion on the adopted
modelling framework in Section 4.6, and in Section 4.7, we present the results from the
estimation of the model for the intention to use AVs for commute trips. The framework that
facilitates researchers/analysts to allow respondents to choose their preferred type of
questionnaire to answer the questions related to attitudes is presented in Section 4.8. The final
section (4.9) presents the salient findings and the current studys limitations.
4.2 QUESTIONNAIRE DESIGN
The responses to our survey are likely to be influenced by the representativeness of the sample,
sample size, the behavioural framework used, the statements used for the Likert scale and the
choice experiment- to name a few. And this study aims to disentangle the impacts and narrow
down the differences in attitudes to the questionnaire types- closed- and open-ended questions.
In this regard, we collect responses from a sample that is representative of the population.
Furthermore, to overcome the potential influence of the framework and statements, we used
P a g e | 60
the framework (Zhang et al. [168]) and stated-preference experiments (Haboucha et al. [40])
tested previously by other researchers.
4.2.1 Experimental Design
The questionnaire for this study comprises two parts. The first part measures the attitudes, and
the second part captures the choices by the individuals. We adopted two different experimental
designs (the first to test the adequacy of open-ended questions and the second for the choice
experiment) and discuss them in the following paragraphs.
In the first part that measures attitudes, we present questions on the perceived ease of use,
perceived usefulness, perceived safety risk, perceived privacy risk, trust, and attitudes towards
the use of AVs to the individuals. To address the first objective of this research- to analyse the
influence of the questionnaire type on the responses, we use two versions of the questionnaire,
Ver_LK and Ver_LKOE. To pursue the remaining objectives, we introduce the third version
of the questionnaire, Ver_OE. The three versions of the questionnaire are as follows: -
Ver_Lk- attitudes are measured using statements depicting the attitude, and the
responses are collected using a five-point Likert scale.
Ver_LkOE- uses a combination of open- and closed-ended questions; we, however,
exclude the responses to the open-ended question from the analysis, as the objective is
to analyse if the open-ended question alters the responses to the closed-ended questions.
The closed-ended questions are the same as in the Ver_Lk, and we use them for
predicting the behaviour.
Ver_OE- In the third version, only open-ended questions are used to measure attitudes.
We present the experimental design in Figure 4.1 below. Each version is presented randomly
to the respondents using the Randomizer feature in Qualtrics, thereby eliminating the
potential for any bias in the data collection.
The second part of the experimental design is related to the mode choice experiment, later used
as a dependent variable for the estimation. We hypothesise that the measured attitudes
influence the dependent variable captured using a set of Stated Preference (SP) questions
presented to the respondents, and individuals choose between the three alternatives, “Regular
Cars”, “Personal Autonomous Vehicles”, and “Shared Autonomous Vehicles”. The choice set
for the SP experiment is designed following the study by Haboucha, Ishaq and Shiftan [40].
The purchase cost for the vehicle, yearly subscription cost, average travel cost per trip and
P a g e | 61
parking cost defines the scenarios presented to the respondents. Table 4.1 presents the
experimental design for the “Stated Preference” experiment adapted from Haboucha, Ishaq and
Shiftan [40].
Figure 4.1 Experimental Design for the Mode Choice for Commute Trips
Table 4.1 SP Experimental Design
Regular
Personal Autonomous Vehicle
Shared Autonomous Vehicle
Vehicle cost
100%
80%
0$ *
100%
150$ *
115%
300$ *
130%
2000$ *
Trip cost
100%
70%
0%
85%
150%
100%
210%
120%
300%
Parking cost
100%
0%
0%
130% + 5$
30%
140% + 5$
60%
150% + 5$
100%
* Yearly costs for membership
Haboucha, Ishaq and Shiftan [41] generated sixteen orthogonal choice scenarios and presented
six choice scenarios to each respondent using these values. Since our priority was to identify
and quantify improvements using open-ended responses to measure attitudes, we used this
experimental design (the different scenarios are presented in Appendix C). In addition, we
carried out a pilot test to check the questionnaire for inconsistencies, which gave us insights
Ver_LKOE
Perceived Ease of Use
Perceived Usefulness
Perceived Safety Risk
Perceived Privacy Risk
Trust
Attitudes
Perceived Ease of Use
Perceived Usefulness
Perceived Safety Risk
Perceived Privacy Risk
Trust
Attitudes
Open-ended questions
Perceived Ease of Use
Perceived Usefulness
Perceived Safety Risk
Perceived Privacy Risk
Trust
Attitudes
Closed-ended Questions
Open-ended Questions
Ver_LK
Ver_OE
P a g e | 62
into the average time for completion. Furthermore, the pilot test also provided information on
questions that lacked clarity, and we used this to revise the questionnaire before launching the
final survey.
4.2.2 Framework Design
The original framework of TAM assumes that the perceived usefulness and the ease of use
influence the use of technology. Perceived usefulness (PU) captures the users belief that using
this technology improves their performance in the organisational context, while perceived ease
of use (PEoU) captures the degree to which the user expects the technology to free their effort.
In this study, we used an extension of the Technology Acceptance Model (TAM), proposed by
Zhang et al. [168]; however, in their study, researchers predict the behavioural intention to use
AVs using a set of Likert scale questions. Our research extends this framework further using a
choice experiment adopted from Haboucha, Ishaq and Shiftan [40]. We present the modified
framework below in Figure 4.2 and the questionnaire in Appendix C.
Figure 4.2 Proposed Framework for the Mode Choice for Commute Trips
4.3 DATA COLLECTION AND DATA CLEANING
We launched this survey in the USA between January and March 2020, and to ensure a speedy
data collection and the representativeness of the collected sample, we used survey panels
provided by Cint. To check for inconsistencies and identify problematic records, we used the
Perceived Ease of Use
Perceived Safety Risk
Perceived Privacy Risk
Perceived Usefulness
Initial Trust
Attitudes
Utility
Socio-economic Characteristics
Travel Characteristics
Choice in SP Experiment
P a g e | 63
approach outlined in Section 3.3. For the open-ended responses, we omitted respondents who
used less than seven words to answer. Based on the recommendations from Cint, we removed
such responses after data collection, as we feared that enforcing conditions while answering a
questionnaire might alter their actual response. After each wave of data collection, we analysed
to identify problematic records from the respondents, which Cint replaced. The final dataset
comprises 3002 complete responses (Ver_Lk- 1012, Ver_LkOE- 1021, Ver_OE- 969). The
sample used was representative of the US population based on gender, ethnicity, and regional
diversity.
4.4 EXPLORATORY ANALYSIS
In this section, we focus on the first objective of this research, to analyse if the method of
collecting qualitative data influences the survey responses.” To facilitate this comparison, we
first analyse if the collected samples are almost equal in terms of the socio-demographic
characteristics of the individuals. Then, having established that the samples are almost similar,
we pursue analysis related to the first objective. In this regard, we first analyse the frequency
distributions before carrying out an in-depth analysis of the statistical significance of the
differences and the impact of the questionnaire type.
4.4.1 Preliminary Analysis on Socio-demographic Characteristics
Table 4.2 presents a comparison of the socio-demographic characteristics of the participants.
Nearly 53% of respondents were female, and the average age was ~40 years old. Ver_LKOE
and Ver_OE of the questionnaire had a slightly higher representation of respondents earning
between $50,000 and $74,999 and possessing a bachelors/graduate degree. Nearly 70% of the
respondents were whites, and 14-18% were African American respondents. Moreover,
approximately 55% of the participants were employed full-time, 30% employed part-time, and
Ver_OE had a slightly higher representation of students. The average number of adults per
household was around 2.2 (standard deviation- 1.00). Unlike the previous study from India,
only about 5% of the respondents did not disclose their household income.
Table 4.2 Socio-Demographic Characteristics
Variable
Levels
Frequency
Ver_LK
Ver_LKOE
Ver_OE
Gender (%)
Female
52.87
53.18
53.66
Male
46.74
46.43
46.13
Prefer not to answer
0.4
0.39
0.21
Average age (in years)
40.9 (13.31)
38.61 (14.23)
36.55 (14.19)
P a g e | 64
Household Income
(in $)
Less than 10,000
9.39
7.74
5.68
10,000 - 14,999
5.14
3.82
2.89
15,000 - 24,999
10.87
8.81
7.84
25,000 - 34,999
12.85
12.54
12.69
35,000 - 49,999
15.51
15.38
15.48
50,000 - 74,999
17.19
19.78
20.64
75,000 - 99,999
12.06
12.14
13.93
100,000 - 124,999
5.24
6.56
7.33
125,000 - 149,999
2.47
3.53
3.10
150,000 - 199,999
2.47
2.84
2.27
More than 200,000
1.88
1.96
2.68
Prefer not to answer
4.94
4.9
5.47
Educational
Qualification (%)
Less than high school graduate
6.23
5.48
6.91
High school graduate or GED
28.85
23.41
20.33
Some college or associate degree
38.24
38.69
36.95
Bachelor's degree
17.29
21.94
23.74
Graduate degree or professional degree
(Master or PhD)
8.2
9.79
11.76
I prefer not to answer
1.19
0.69
0.31
Ethnicity (%)
White
69.37
70.71
69.45
Black or African American
17.39
14.99
13.83
American Indian or Alaska Native
1.19
1.47
1.86
Asian
4.05
3.33
5.06
Native Hawaiian or other Pacific Islander
0.59
0.49
0.72
Some other race
6.62
8.23
7.53
I prefer not to answer
0.79
0.78
1.55
Employment
Status (%)
Full-time
55.93
52.01
53.46
Part-time
30.24
31.73
27.55
Student
13.83
16.26
18.99
Number of adults
2.23 (0.99)
2.24 (1.00)
2.27 (1.00)
Number of children aged between 8 and 17
0.62 (0.97)
0.57 (0.97)
0.50 (0.92)
Number of children aged under 8
0.38 (0.78)
0.31 (0.71)
0.29 (0.64)
In the subsequent paragraphs, we present a discussion on the travel characteristics of
individuals (refer to Table 4.3). In their previous week, approximately 30% of respondents
travelled less than 35 miles, ~20% travelled between 35 and 70 miles, while more than 30%
travelled more than 105 miles. The average commute distance was about 13 miles (Ver_LK-
14.55, Ver_LKOE-12.99 and Ver_OE- 13.81), and the average commute time was
approximately 24 minutes (Ver_LK- 24.42, Ver_LKOE-24.09 and Ver_OE- 24.54). An
overwhelming proportion (90%) of respondents performed errands at least once a week during
their commute to work or mid-day. On average, car-users paid less than $2 towards parking
charges; however, non-car-users reported the estimated average parking charges to be ~$8.
This difference between the actual values and the estimate of parking charges is probably
P a g e | 65
because most car-users had access to free parking, which the non-car-users might have
overlooked. The average occupancy in a car was 1.47. Among the car-users, nearly 50%
considered it essential to leave items in the car. For all three versions of the questionnaire, the
disposable amount for a new car averaged at ~$25,000. Interestingly, nearly 30% of
respondents did not drive a car with Adaptive Cruise Control (ACC), while ~40% reported
using it in the last 12 months. 77% of participants answering Ver_LK of the questionnaire were
aware of AVs, while the percentage was more than 85 among those answering Ver_LKOE and
Ver_OE. When asked if they had ever ridden an AV, less than 10 % were affirmative.
Table 4.3 Travel Characteristics of the Individuals (I)
Variable
Levels
Frequency
Ver_LK
Ver_LKOE
Ver_OE
Total vehicle miles
travelled by all
modes in the past 7
days (%)
Less than 35.0 miles
33.10
29.97
28.78
35.0 - 70.0 miles
19.47
18.22
17.96
70.0 - 105.0 miles
11.56
11.56
14.96
More than 105.0 miles
30.63
36.73
36.02
I don't know, or I prefer not to answer
5.24
3.53
2.27
Commute distance (in miles)
14.55 (14.9)
12.99 (13.18)
13.81 (13.31)
Commute time (in minutes)
24.42 (19.55)
24.09 (20.25)
24.54 (18.70)
Frequency of
errand during the
day (%)
Daily
17.98
16.06
13.42
3 - 4 times a week
24.60
25.76
25.70
1 - 2 times a week
31.82
32.71
36.33
Less than once a week
13.93
16.65
16.92
Never
11.66
8.81
7.64
Average parking charges (in $)
1.29 (7.17)
1.98 (9.70)
1.41 (5.69)
Average parking charges (non car-users) (in $)
5.42 (6.61)
10.48 (14.05)
8.36 (7.04)
Number of individuals in car
1.47 (0.81)
1.47 (0.80)
1.45 (0.76)
Importance of the
ability to leave
items in your car
(%)
Very important
12.94
11.26
11.04
Somewhat important
30.93
32.62
32.51
Not important
41.01
43.68
45.51
Non car users
15.12
12.44
10.94
Average Cost of Car (in $)
25129.68
(16260.59)
25886.82
(14748.11)
25919.13
(12872.37)
Frequency of use of
Adaptive Cruise
Control when
driving in the last
12 months (%)
Very frequently
6.53
6.56
4.54
Frequently
6.43
7.44
8.67
Occasionally
12.27
12.63
13.31
Rarely
9.00
7.44
7.64
Very rarely
7.02
6.07
6.09
Never
24.23
22.04
18.78
I don't know about ACC
8.51
7.93
8.46
I don't drive a car with ACC
26.01
29.87
32.51
Heard of AVs (%)
No
22.23
14.20
11.15
Yes
77.77
85.80
88.85
P a g e | 66
Ever ridden a full
AV (%)
No
91.8
92.26
92.05
Yes
8.20
7.74
7.95
Table 4.4 presents the travel characteristics, particularly related to the mode choice of
respondents for commute trips. While about 50% reported having never walked, ~75% never
used a bike. The corresponding percentages for motorbikes and car-sharing were ~91% and
~80%, respectively. Unsurprisingly, the most used mode for commute was the car (~95%).
When asked about taxi and public transport use frequency, ~60% of participants reported
having never used them for the commute.
Table 4.4 Travel Characteristics of an Individual (II)
Variable
Levels
Frequency
The mode used for Commute trips is
Ver_LK
Ver_LKOE
Ver_OE
Walk (%)
Every day
21.84
17.34
16.00
Several times a week
14.03
15.87
14.86
Several times a month
9.49
8.42
10.53
Several times a year
10.28
10.38
12.18
Never
44.37
47.99
46.44
Bike (%)
Every day
1.09
1.27
0.83
Several times a week
4.64
3.43
3.83
Several times a month
6.82
6.17
5.60
Several times a year
13.14
12.44
13.37
Never
74.31
76.69
76.37
Motorbike (%)
Every day
0.49
0.39
0.21
Several times a week
2.08
1.76
1.14
Several times a month
2.08
2.25
1.14
Several times a year
4.45
3.82
4.33
Never
90.91
91.77
93.19
Car (%)
Every day
67.00
67.19
70.07
Several times a week
17.89
20.37
18.99
Several times a month
5.24
5.09
5.37
Several times a year
2.96
3.13
2.48
Never
6.92
4.21
3.10
Car sharing (%)
Every day
0.79
1.67
0.93
Several times a week
3.66
2.64
3.72
Several times a month
5.34
4.80
4.54
Several times a year
8.10
11.46
9.29
Never
82.11
79.43
81.53
Taxi (%)
Every day
0.69
0.88
0.41
Several times a week
5.24
3.43
4.23
Several times a month
10.28
11.26
12.07
Several times a year
21.34
22.82
26.42
Never
62.45
61.61
56.86
Every day
4.45
4.80
4.75
P a g e | 67
Public Transport
(%)
Several times a week
8.99
6.37
7.74
Several times a month
7.41
7.93
7.22
Several times a year
16.80
17.04
21.05
Never
62.35
63.86
59.24
4.4.1.1 Perceived Ease of Use
Figure 4.3 Frequency Distribution for Perceived Ease of Use
This section of the questionnaire measured the perceptions of the ease of using AVs. Referring
to the frequency distribution presented in Figure 4.3, almost half of the participants believed it
would be easy for them to learn how to use AVs. Of the remaining, about 30% were neutral to
the statement. However, when asked if they would find it easy to get AVs to do what they want
them to do, the majority (~40%) were unsure about it, with another 30% of respondents
agreeing to the statement and a further 10% strongly agreeing to it. Thus, in general, people
are more confident that they could become skilful at using AVs or that AVs might be easy to
use.
4.4.1.2 Perceived Usefulness
This extension of the Technology Acceptance Model (TAM) emphasises the importance of the
perceived usefulness in the formation of attitudes. We used five statements, a. using
autonomous vehicles will be useful in meeting my travel needs, b. autonomous vehicles will let
me do other tasks, such as eat, watch a movie, be on a cell phone on my trip, c. using
P a g e | 68
autonomous vehicles will decrease my accident risk, d. using autonomous vehicles will relieve
my stress of driving and e. I find autonomous vehicles to be useful when I’m impaired (e.g.,
drowsy, drunk, drugs).
Figure 4.4 Frequency Distribution for Perceived Usefulness
Referring to Figure 4.4, we can observe that most respondents assumed that AVs might be
useful in meeting travel needs, perform other activities during travel, and travel when impaired
from driving (when under the influence of alcohol or drowsy). A significantly higher
proportion of participants responded neutrally to whether AVs might reduce the number of
accidents and are slightly more pessimistic about this proposition. However, among those
answering Ver_LKOE, there were fewer neutral respondents, and the respondents were
primarily optimistic. On the statement on whether AVs might reduce the stress of driving,
participants were mostly positive; however, we still have several sceptics.
4.4.1.3 Perceived Safety Risk
Interestingly, as Figure 4.5 illustrates, an overwhelming proportion of respondents agreed with
the statements for this set of questions. Almost three in every four respondents are worried
P a g e | 69
about the general safety of such technology. They were worried that the failure or malfunction
of AVs might cause accidents. It is worth noting that the proportion of respondents agreeing
strongly to the statements has increased significantly for the Ver_LKOE of the questionnaire.
Figure 4.5 Frequency Distribution for Perceived Safety Risk of AVs
4.4.1.4 Perceived Privacy Risk
Figure 4.6 Frequency Distribution for Perceived Privacy Risk of AVs
When analysing the frequency distribution for the responses on the concerns regarding the
privacy risks posed by such technology, the pattern was bell-shaped (Figure 4.6). There was a
very high proportion of neutral respondents for all three questions, with almost an equal
proportion of respondents agreeing/disagreeing with the statements. Interestingly, most
respondents were okay with AVs collecting personal information. However, using and sharing
the collected personal information without their consent was a big concern.
P a g e | 70
4.4.1.5 Trust
When investigating the level of trust of the technology among people, people, in general, are
mostly neutral to the technology (refer to Figure 4.7). Only about 25% consider them
dependable or reliable. From the perspective of a transportation planner, this requires
immediate intervention. Overall, the proportion of respondents trusting AVs was nearly 30%
among those answering Ver_LK and approximately 40% for respondents answering
Ver_LKOE.
Figure 4.7 Frequency Distribution for Trust in AVs
4.4.1.6 Attitudes
Figure 4.8 presents the variation of the responses to the questions related to overall attitudes.
In contrast to the respondents answering Ver_LK of the questionnaire, an additional 10% of
respondents answering Ver_LKOE considered AVs a good and wise idea. When asked if the
participants considered AVs to be pleasant, most of the responses were neutral. Thus, the
overall attitudes towards the use of AVs are very positive among the public.
P a g e | 71
Figure 4.8 Frequency Distribution for Attitudes towards AVs
4.4.1.7 Modal Share for Commute Trips
Figure 4.9 Frequency Distribution for the Mode for Commute Trips
The questionnaire also investigated the likelihood of respondents choosing a “Regular Car”,
“Private AV”, or “Shared AV” for the commute trips. We accomplished this with the help of a
personalised stated-preference survey that used the travel characteristics of the respondents
reported in the previous sections of the questionnaire. Consistent with the claims presented in
the previous sections of the questionnaire, we observe a difference in the frequency
distributions for the alternative questionnaire types. Among those answering Ver_LK of the
questionnaire, nearly 50% of participants preferred “Regular Car.” About 30% of respondents
preferred “Private AV”, and the rest opted for “Shared AV”. The distribution of shares was
different for Ver_LKOE, and the distribution was similar to that observed for Ver_OE of the
0
10
20
30
40
50
Regular Car Private AV Shared AV
Modal Share for Commute Trips
Ver_LK VerLKOE Ver_OE
P a g e | 72
questionnaire. Approximately 40% of the respondents chose “Regular Car”, ~34% chose
“Private AV”, while the remaining (~25%) chose to use “Shared AV” for their commute trips.
Figure 4.9 compares the breakdown of mode shares based on the different questionnaire types.
4.4.2 Statistical Analysis
After identifying the differences in the frequency distributions using open-ended questions
before the Likert scale responses, it is quintessential to assess if these differences are
statistically significant. For this, we performed the non-parametric test Mann-Whitney U Test
[207], and in the second column of Table 4.5, we tabulate the results (p-values) of the analysis.
Except for three, the differences in the distributions were statistically significant (95%
confidence interval) for all statements measuring attitudes. However, for statements, I will
find it easy to get Autonomous Vehicles to do what I want them to do”, Using Autonomous
Vehicles will be useful in meeting my travel needs, and “Autonomous Vehicles are reliable”,
the previously observed differences were not statistically significant.
4.4.3 In-Depth Analysis
To gain further insights on the influence of questionnaire type on the responses, we estimated
models for each Likert scale question. As described by de Abreu e Silva, Papaix and Chen
[211], as explanatory variables in the specification, we included those depicting the socio-
demographic characteristics of the individual (age, gender, ethnicity, household income)
questionnaire type. Considering the ordinal nature of the dependent variable, we used ordered
Probit models for each of the dependent variables. Greene and Hensher [210] present a detailed
description of the underlying principle and the estimation of the model. Referring to the third
column of Table 4.5, the coefficient for the questionnaire type was statistically significant for
at least 12 (60%) of the statements. The results highlight the potential of open-ended questions
influencing the responses to Likert scale questions.
Table 4.5 Results of the Statistical Analysis on Whether Open-ended Questions Influence
Responses to Likert Scale Questions
Statements Depicting Attitudes
Mann-Whitney
U Test
Coeff
Perceived Ease of Use
Learning to use Autonomous Vehicles will be easy for me
0.013**
0.076
I will find it easy to get Autonomous Vehicles to do what I want them to do
0.598
-0.020
It will be easy for me to become skillful at using Autonomous Vehicles
0.003***
0.104**
I will find Autonomous Vehicles easy to use
0.042**
0.058
Perceived Usefulness
P a g e | 73
Using Autonomous Vehicles will be useful in meeting my travel needs
0.085*
0.035
Autonomous Vehicles will let us do other tasks such as eating, watching a
movie, be on a cell phone during my trip
0.000***
0.162***
Using Autonomous Vehicles will decrease my accident risk
0.000***
0.153***
Using Autonomous Vehicles will relieve my stress of driving
0.006***
0.106**
I find Autonomous Vehicles to be useful when I'm impaired
0.001***
0.143***
Perceived Safety Risk
I'm worried about the general safety of such technology
0.014**
0.106**
I'm worried that the failure or malfunction of Autonomous Vehicles may
cause accidents
0.001***
0.148***
Perceived Privacy Risk
I'm concerned that Autonomous Vehicles will collect too much personal
information from me
0.000**
-0.180***
I'm concerned that Autonomous Vehicles will use my personal information for
other purposes without my authorization
0.002***
-0.136***
I'm concerned that Autonomous Vehicles will share my personal information
for other purposes without my authorization
0.006***
-0.110**
Perceived Trust
Autonomous Vehicles are dependable
0.048**
0.051
Autonomous Vehicles are reliable
0.133
0.039
Overall, I can trust Autonomous Vehicles
0.043**
0.056
Attitudes
Using Autonomous Vehicles is a good idea
0.001***
0.115**
Using Autonomous Vehicles is a wise idea
0.002***
0.110**
Using Autonomous Vehicles is pleasant
0.036**
0.042
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
4.5 EXTRACTION OF DATA
The emphasis in this section is on the second objective, to develop an approach to extract
open-ended responses from a survey and process the data”.
4.5.1 Treatment of Closed-ended Responses
To assess the internal reliability of the Likert scale responses, we used Cronbachs Alpha
values. In general, relatively high values were obtained for both versions of the questionnaire
but observed higher reliability for Ver_LKOE. In Table 4.6, we present the values for internal
reliability.
Table 4.6 Internal Reliability- Cronbach’s Alpha
Construct
Ver_LK
Ver_LKOE
Perceived ease of use
0.905
0.908
Perceived usefulness
0.830
0.841
Perceived safety risk
0.839
0.863
Perceived privacy risk
0.912
0.918
Trust
0.878
0.907
Attitude
0.891
0.914
P a g e | 74
To test the validity of the questionnaire, we compared the average scores for the Likert scale
responses among two groups (have awareness about AVs v/s unaware about AVs).
Respondents following news about AVs seem to have responded correctly to the statements.
The differences between the two groups were statistically significant (t-stats) for both versions
(“Ver_Lk”- 2.504 and “Ver_LKOE”- 2.638) of the questionnaire, indicating the validity of the
questionnaire.
4.5.2 Extraction of Information from Open-ended Responses
4.5.2.1 Exploratory Analysis
We used an approach similar to that discussed in Section 3.5.2 to deal with the open-ended
responses. We used “Grammarly” to correct mistakes with a generic pattern such as spelling
mistakes and grammatical mistakes. We must emphasise the importance of the efforts required
for data cleaning. We explored data in greater detail to identify phrases using “Regular
Expressions”. For example, phrases “don’t trustor “do not trust” might convey the opposite
meaning without the words “do”, “not”, and “don’t”. So, we identified such phrases and others
that mean the same and replaced them with “no_trust”. Doing this is one of the first steps in
dimensionality reduction, which we believe is critical for the analysis and should carry out
carefully to ensure that it is free of bias.
Furthermore, in this study, to improve the Topic Models, we included more words (such as
Autonomous, Vehicles, Cars, etc.) that do not convey any contextual meaning in the text
analysis into the list of “Stop Words”. This process was iterative and time-consuming, and at
any given stage, the analyst should decide what the appropriate level of data cleaning is. In
each iteration, we evaluated the frequencies of occurrences of the words and these
combinations, and if the frequency was sparse (less than 10 occurrences), we left the words as
they are.
Contrary to the previous study, there was a significant reduction in the average number of
words per response after cleaning the open-ended responses. On average, there was a drop of
about 65%. Therefore, we present the average number of words per response in Table 4.7.
Interestingly, the responses were the longest for the question related to the “safety concerns”
associated with the use of AVs.
As followed in Section 3.5.2.1, we use the spellings after “stemming” while referring to the
words so that the spellings might vary. For example, to answer the first question related to the
P a g e | 75
easiness in use, respondents used “drive” (4.33%), “use” (3.76%), “easi” (3.09%), “technolog”
(2.30%) and “make” (1.49%). To describe the use of AVs, the most frequently used words
included “drive” (5.38%), “use” (3.48%), “driver” (2.05%), “help” (2.02%) and “accid”
(1.91%). “Drive” (2.33%), “accid” (1.72%), “malfunct” (1.64%), “concern” (1.63%) and
“human” (1.61%) were used to articulate the potential safety concerns associated with the use
of AVs and when discussing the privacy concerns respondents used “privaci” (5.38%),
“concern” (3.35%), “inform” (2.61%), “use” (1.84%) and “issu” (1.66%). In the fifth question,
respondents were asked to describe their reasons to trust or not trust AVs and to answer this,
“trust” (4.82%), “drive” (2.52%), “technolog” (2.36%), “control” (2.10%) and “use” (1.55%)
were used. In describing the general attitudes towards AVs, respondents used the words “drive”
(2.60%), “use” (2.29%), “futur” (2.10%), “technolog” (1.68%) and “need” (1.59%).
Table 4.7 Average Number of Words per Response
Open-ended Questions
Original
Cleaned
Do you think that it will be easy to use Autonomous Vehicles? (OE1)
22.03
7.78
Do you believe that Autonomous Vehicles are useful? (OE2)
20.50
8.07
Do you have safety concerns regarding the use of Autonomous Vehicles? (OE3)
22.42
8.14
Do you have concerns related to privacy associated with the use of Autonomous Vehicles? (OE4)
16.90
5.93
Would you as a use trust an Autonomous Vehicle? (OE5)
18.46
6.45
What are your general opinions about Autonomous Vehicles? (OE6)
20.59
6.97
4.5.2.2 Results from Topic Models
In this research, six open-ended questions were presented to respondents answering Ver_OE
of the questionnaire. Along with the first five open-ended questions, we presented respondents
with an option to agree/disagree with the statements. For each of these five questions, we used
LDA and sLDA to extract information from these responses, and for the final question, LDA
was the only option considered. In the estimation of sLDA, we used responses to the
agree/disagree statement as the response variable. A trial-and-error approach was adopted to
determine the number of extracted topics. After each estimation, we investigated if the
extracted topics overlapped (based on inter-topic distance) and were meaningful. However, we
did not see significant improvement in the performance of the models with the use of sLDA;
hence, we are limiting our discussion to LDA. We present the results in Table 4.8.
We extracted four topics from OE1; the first extracted topic (To_L11) was primarily about the
easiness of getting it to work, learn and gain trust. The second topic (To_L12) emphasised the
need for human presence to respond to uncertainties, the third topic (To_L13) covers the
P a g e | 76
easiness in operation, and the fourth topic (To_L14) covers additional benefits from the self-
navigation in addition to the easiness.
Table 4.8 Top 5 Words for Each Topic for Open-ended Questions
Word_1
Word_2
Word_3
Word_4
Word_5
OE1- Do you think that it will be easy to use Autonomous Vehicles
To_L11
use
easi
technolog
work
get
To_L12
drive
road
human
mani
accid
To_L13
drive
control
driver
make
easier
To_L14
oper
go
everyth
assum
user
OE2- Do you believe that Autonomous Vehicles are useful?
To_L21
time
better
environ
make
save
To_L22
drive
driver
thing
work
make
To_L23
accid
human
traffic
reduc
help
To_L24
drive
use
get
help
disabl
To_L25
use
drive
need
technolog
situat
To_L26
take
go
attent
pay
use
To_L27
driver
help
transport
safeti
safer
OE3- Do you have safety concerns regarding the use of Autonomous Vehicles?
To_L31
concern
drive
safeti
control
self
To_L32
technolog
safe
work
need
time
To_L33
malfunct
accid
caus
comput
happen
To_L34
road
driver
drive
get
accid
To_L35
thing
go
wrong
abl
make
To_L36
human
error
driver
make
situat
OE4- Do you have concerns related to privacy associated with the use of Autonomous Vehicles?
To_L41
make
thing
privat
abl
noth
To_L42
hack
go
technolog
get
system
To_L43
privaci1
concern
issu
sure
relat
To_L44
track
alreadi
technolog
locat
differ
To_L45
drive
someon
record
even
driver
To_L46
inform
data
person
compani
need
OE5- Would you as a user trust an Autonomous Vehicle?
To_L51
safe
test
technolog
trust
enough
To_L52
trust
technolog
use
time
ye
To_L53
trust
safeti
work
concern
road
To_L54
drive
driver
human
comput
better
To_L55
control
drive
make
abl
go
OE6- What are your general opinions about Autonomous Vehicles?
To_L61
use
drive
cool
feel
get
To_L62
accid
driver
road
potenti
danger
To_L63
futur
great
technolog
time
transport
To_L64
drive
make
human
thing
control
To_L65
need
technolog
work
lot
idea
To_L66
good
safeti
concern
thing
improv
1
privacy and concern can be combined; they however did not appear in the same sequence in
a sentence and hence was not combined
P a g e | 77
We extracted seven broad themes using LDA from the responses to the perceived usefulness
of AVs. First, respondents believed that AVs might save travel time and make travel more
environmentally friendly (To_L21). Second, on the ability to work during travel, participants
shared contrasting views. Respondents believed that AVs might facilitate working during travel
(To_L22); it may, however, demand additional attention, which may negatively affect their
work (To_L26). Also, AVs might make travelling safer and mitigate congestion (To_L23),
make parking easier (To_L27) and ensure mobility for the disabled (To_L24). Finally, many
participants emphasised the need for human control while using AVs (To_L25).
The next open-ended question evaluated the safety concerns associated with the use of AVs.
The safety concerns stemming from the lack of control is a significant concern (To_L31). Many
argue that the lack of control can cause accidents (To_L33) or due to malfunctions (To_L34)
or sensor fails (To_L35). Furthermore, as humans are error-prone, many believe that there
could be flaws in the software programs (To_L36) and emphasise the need for thorough testing
of AVs before their widespread deployment (To_L32).
We then evaluated if individuals had privacy concerns related to the use of AVs. Many shared
no privacy concerns as they opined that it was unnecessary if they are transparent (To_L41).
Another argument was that the information is already in the public domain (To_L44) through
various platforms. Some opined that they do have concerns, but it was not something that they
should be bothered about (To_L43). Furthermore, it would not be a concern if users are
informed about data collection and storage (To_L46). Regarding some of the concerns, they
were mostly related to hacking (To_L42) and the potential for being watched (To_L45).
In the fifth open-ended question (OE5), we asked respondents if they would trust AVs. A
significant proportion of respondents were not yet ready to trust AVs. Trust issues could be
related to the need for further testing (To_L51) and the potential safety concerns due to
malfunctions (To_L53). Sceptics argued that humans could drive better (To_L54); however,
respondents favouring the system argued that computers might control better (To_L55).
Probably over time, more users might start trusting the system (To_L52).
In response to the general perceptions of AVs (OE6), we can group the ideas discussed by
respondents into six categories. In general, respondents were optimistic, and they consider it
cool to use AVs and a good idea (To_L61). Many considered AVs a tremendous technological
advancement with some safety concerns (To_L62). It is encouraging also to note that many
considered it futuristic (To_L63), although they emphasised the need for additional work
P a g e | 78
(To_L65). Finally, in general, people perceive it as a safe, economical, and environmentally
friendly travel option (To_L66), probably safer than humans (To_L65).
Figure 4.10 Inter-topic Distance for LDA (clockwise from top left) OE1, OE2, OE3, OE4,
OE5, OE6
After estimating the coefficients, we evaluated the intra-topic distance using the visualisation
tool pyLDAvis [195] (Figure 4.10). There is no overlap between the topics extracted for OE1
and OE3. The overlaps are high for OE2 and moderate for all the other questions. OE3 and
OE4 have moderate overlaps, and OE1 and OE5 have no overlap.
4.5.3 Comparison of Closed- and Open-ended Responses
This section investigates if we could find coherence between the responses to the Likert scale
questions and the responses to the open-ended questions. The open-ended questions were
designed with care to ensure this. As discussed previously, twenty Likert scale questions were
presented to respondents answering questionnaires Ver_Lk and Ver_LkOE and six open-ended
questions to those answering Ver_OE. It is encouraging to note that we could extract topics
from the open-ended response related to most aspects of closed-ended questions. Furthermore,
the open-ended responses could extract more information on the topic. We present a discussion
on the responses in the paragraphs to follow (refer to Table 4.9).
P a g e | 79
For the questions related to the “Perceived Ease of Use”, we could achieve a one-to-one
mapping between the Likert scale and the open-ended responses. In addition to the four
different aspects from the Likert scale responses, Topic Models also highlighted the need for
control. Responses to the question on the “Perceived Usefulness” did not have a direct mapping
to the Likert scale questions “Useful in meeting travel needs” and “Useful when I’m impaired.”
Having identified the remaining characteristics presented in the Likert scale questions, Topic
Models identified other aspects such as “mobility for the disabled”, “congestion reduction”,
“environmental friendliness”, and “travel time reduction” that makes AVs worthwhile. When
asked about the safety concerns regarding AVs, respondents emphasised the worries on general
safety, accidents caused due to malfunctions and failures, lack of testing and the general error-
prone nature of humans in designing such systems. It is worth noting that the last two aspects
presented in the statement before this were observed only in the open-ended responses.
The distribution of the responses to the Likert scale questions related to privacy concerns was
bell-shaped, with a very high number of neutral responses and an almost equal proportion of
respondents who either agree or disagree with each of the statements. Results of the analysis
of open-ended responses indicate that most respondents are not worried about privacy issues.
And it stems out from the reasoning that there is nothing to be worried about if you are
transparent. Another argument is that the information is already in the public domain and that
the companies will be transparent on the data collection and storage policies. Individuals who
are indeed worried about privacy foresee the possibility of hacking and tracking.
Slightly more than 50% of respondents answering Ver_OE of the questionnaire trusted AVs.
The Likert scale questions covered questions asking if respondents considered AVs
“dependable”, “reliable”, and “can be trusted”. Analysing the extracted topics indicated that
respondents emphasise testing and time to appreciate AVs because respondents fear
malfunctions. Interestingly, respondents who trust AVs argue that computers can control better,
while those who do not trust argue otherwise. There was almost no correspondence between
the responses to the Likert scale questions and open-ended questions for the question related
to the general attitudes on AVs. Respondents answering both versions emphasise that the use
of AVs is a good idea. In addition to this, respondents answering Ver_OE consider it futuristic
and sustainable.
The results reiterate the previous study's findings using data collected from India that open-
ended questions can collect nearly the same information from the Likert scales if asked
P a g e | 80
appropriately. However, one of the caveats of the approach is the inability of open-ended
questions to capture the degree or intensity of an individuals attitudes.
Table 4.9 Mapping between Closed- and Open-ended Responses
Topics
Likert Scale
Topics
Do you think that it will be easy to use Autonomous Vehicles?
Easy for me
X
X
Easy to get them to do what I want them to do
X
X
Easy to become skilful
X
X
Easy to use
X
X
Do you believe that Autonomous Vehicles are useful?
Useful in meeting travel needs
X
Perform other tasks
X
X
Decrease my accident risk
X
X
Relieve my stress of driving
X
X
Useful when I’m impaired
X
Mobility for the disabled
X
Lesser congestion
X
Better for environment
X
Saves time
X
Do you have safety concerns regarding the use of Autonomous Vehicles?
Worried about the general safety of such technology
X
X
Worried that failure or malfunction of AVs may cause accidents
X
X
Need more testing
X
Humans are error-prone
X
Do you have concerns related to privacy associated with the use of Autonomous Vehicles?
Collect too much personal information from me
X
X
Use personal information for other purposes without authorisation
X
X
Share personal information for other purposes without authorisation
X
X
Potential for being watched
X
Would you as a user trust an Autonomous Vehicle
Dependable
X
X
Reliable
X
X
Can be trusted
X
X
What are your general opinions about Autonomous Vehicles?
Good idea
X
X
Wise idea
X
Pleasant
X
Futuristic
X
Sustainable
X
4.6 MODELLING FRAMEWORK
This research used Probabilistic Graphical Models (PGMs), which accounts for uncertainty
using probability theory. The advantage of using PGMs is that it is a mathematically grounded
framework for measuring the changes in uncertainty with the availability of new data.
P a g e | 81
Furthermore, the framework has familiarities with Bayesian Structural Equations Models
[212], and for a detailed description of the approach, readers may refer to Peled et al. [194].
In this paragraph, we discuss the proposed modelling framework. Consistent with this
framework, Figure 4.11 and Figure 4.12 depicts the Probabilistic Graphical Model (PGM). In
the figures, we present the observed variables using shaded nodes, the latent variables using
unshaded nodes and the arrows to indicate the relationship between the different variables. The
unshaded node (Att) represents the attitude unknown to the researcher/policymaker. Each
measured attitude is related to the unknown attitude “Att” using a set of K-dimensional
multivariate linear regression models. Having estimated the latent attitude, they are then used
to model the mode choice (Y) for commute trips.
Considering the nominal nature of the observed choice “Y”, we used a multinomial logit
formulation to model it, with socio-economic characteristics of the individual, travel
characteristics, familiarity with AVs, latent attitude “Att”, and characteristics of stated-
preference experiments as explanatory variables. αC and βC are the respective alternative
specific constants and coefficients for the “choice” model. The larger plate with “N” indicates
N repetitions of the model, and the smaller plate with “C” indicate C repetitions of the model.
Having devised the PGM, we outline the generative process by first defining the distributions
for the coefficients (scalars in regular font, vectors in bold). The distribution of the coefficients
is assumed a mean “0” and a standard deviation “1”. First, we draw the latent attitudinal
variable “Att” from a multivariate normal distribution with a mean estimated using the
regression equation (a function of attitudes) and a standard deviation of “1”. Next, we draw the
choice variable “Y” from a multinomial distribution using the utility computed using the socio-
demographics, travel, attitudes, and characteristics of the choice experiment.
Individual Models
To benchmark the performance of our proposed framework, we estimated three models using
the individual datasets (Ver_LK, Ver_LKOE and Ver_OE). In Figure 4.11, shaded node, Attin
depicts the attitudinal variables measured using closed- or open-ended questions for Ver_LK,
Ver_LKOE or Ver_OE versions. αin and γin represent the intercepts and the slopes for Ver_LK
(similar for other datasets). αc and γc represent the alternative-specific constants and
coefficients for the utility equation, and shaded nodes “Xsand “Y” denotes explanatory
variables and choice variable.
P a g e | 82
Generative Process
1. For equation k {1, …, K}:
1. 󰇛󰇜󰇛󰇜󰇜
2. 󰇛󰇜󰇛󰇜󰇜
3. 󰇛󰇜󰇡󰇛󰇜󰇻󰇛󰇜󰇛󰇜󰇛󰇜󰇢󰇜
2. For each class c {1, …, C}:
1. 󰇛󰇜
2. 󰇛󰇜
3. 󰇛󰇛󰇟󰇠󰇜󰇜
Joint Probability Distribution
󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜
 󰇛󰇜
 󰇭󰇛󰇜
󰇛󰇜
󰇛󰇜󰇛󰇜
 󰇮
Figure 4.11 Probabilistic Graphical Model for Individual Model
Proposed Model (Figure 4.12)
Shaded node, AttLK, AttLKOE and AttOE depict the attitudinal variables measured using closed-
or open-ended questions for Ver_LK, Ver_LKOE and Ver_OE. αLK and γLK represent the
intercepts and the slopes for Ver_LK (similar for other datasets). Furthermore, “z” indicates
the type of questionnaire presented to the respondents. αc and γc represent the alternative-
specific constants and coefficients for the utility equation, and shaded nodes “Xs and “Y”
denotes explanatory variables and choice variable.
P a g e | 83
Figure 4.12 Probabilistic Graphical Model for the Proposed Model
Generative Process
1. For equation k {1, …, K}:
1. 󰇛󰇜󰇛󰇜󰇜
2. 󰇛󰇜󰇛󰇜󰇜
3. 󰇛󰇜󰇛󰇜󰇜
4. 󰇛󰇜󰇛󰇜󰇜
5. 󰇛󰇜󰇛󰇜󰇜
6. 󰇛󰇜󰇛󰇜󰇜
7. 󰇛󰇜󰇛󰇜󰈑
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇜
2. For each class c {1, …, C}:
1. 󰇛󰇜
2. 󰇛󰇜
3. 󰇛󰇛󰇟󰇠󰇜󰇜
Joint Probability Distribution
󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
 󰇛󰇜

󰇭 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜
󰇛󰇜󰇛󰇜
 󰇮
P a g e | 84
4.7 ESTIMATION RESULTS
The discussion in this Section pertains to the third objective, and the models were estimated
using Pyro in a GPU. We used the Stochastic Variational Inference proposed by Hoffman et
al. [207] to draw the Bayesian Inference. To benchmark the performance of the models, we
estimated three models separately for each of the datasets (named “Ind”) and compared the
performance of the proposed model (named “Prop”) with the performance of each of these
individual models. Since the models for Ver_OE estimated using sLDA did not perform better
than LDA, we present only the results from LDA in this section. We performed this analysis
using both a training (80%) and a test set (20%). We present the performance measures of the
individual models in Table 4.10. Similar to the approach followed in Section 3.7, we compare
the performance using the initial log-likelihood (LLI), log-likelihood with respect to constants
(LLC), final log-likelihood (LLF) and the McFadden pseudo-R-squared value (ρ2). We compute
the McFadden pseudo-R-squared value with respect to constants () to account for the
improvement of the model with respect to constants, and the adjusted McFadden pseudo-R-
squared is used to account for the improvements with the estimation after considering the
estimated parameters. We did not account for the parameters for LDA while computing the
adjusted values for ρ2 and Count R2. In addition to this, we present the goodness-of-fit measures
such as the count R2 and the F1 scores.
This analysis has two parts- to quantify the improvements, a. with the introduction of open-
ended questions before the set of closed-ended responses, b. with the use of open-ended
questions. By pursuing the first part of the analysis, we re-visit and investigate further the
findings from our research using India's dataset (Chapter 3). The second part of the analysis
involves extending the analysis further by relying only on open-ended questions to measure
attitudes.
To address the first part of the analysis, we compare the models performance using Ver_LK
and Ver_LKOE of the questionnaire. Referring to Table 4.10, improvements in results based
on ρ2 and the count R2 was the biggest for the models estimated using Ver_LK of the
questionnaire. However, after accounting for the constants, the models estimated using
Ver_LKOE had the best performance and a similar trend for Count R2. This loss in performance
is reasonable as, unlike other datasets, Ver_LK had a higher proportion of respondents
choosing “Regular Cars”, which was significantly higher than the other options presented to
the respondents. Having nearly 50% of the respondents choosing a given alternative is likely
P a g e | 85
to influence models’ forecasting capability. Exploring the second aspect of this analysis, we
could not find improvements in performance using questionnaires that used only open-ended
questions to measure attitudes.
Table 4.10 Goodness-of-fit Measures for Training and Test Set
Ver_LK
Ver_LKOE
Ver_OE
Ind
Prop
Ind
Prop
Ind
Prop
Training Set
LLI
-5328.27
-5211.82
-5127.22
LLC
-5021.83
-5059.77
-5106.44
-5111.87
-5080.24
-5100.86
LLF
-4402.87
-4451.77
-4552.77
-4477.36
-4654.19
-4733.33
ρ2
0.1737
0.1645
0.1265
0.1409
0.0923
0.0768
0.1233
0.1202
0.1084
0.1241
0.0839
0.0721
K
97
97
63
Adj. ρ2
0.1555
0.1463
0.1078
0.1223
0.0800
0.0645
Count R2 (%)
59.794
58.495
59.949
58.263
50.739
49.175
Adj. Count R2 (%)
35.087
35.871
43.486
44.020
33.516
32.938
F1 Score
0.535
0.528
0.570
0.560
0.494
0.478
Test Set
LLI
-1260.11
-1424.90
-1175.52
LLC
-1169.96
-1186.24
-1406.59
-1408.11
-1161.49
-1164.32
LLF
-1283.17
-1220.51
-1466.34
-1248.59
-1156.79
-1122.08
ρ2
-0.0183
0.0314
-0.0291
0.1237
0.0159
0.0455
-0.0968
-0.0289
-0.0425
0.1133
0.0040
0.0363
K
97
97
63
Adj. ρ2
-0.0953
-0.0456
-0.0972
0.0557
-0.0377
-0.0081
Count R2 (%)
54.577
53.357
55.744
55.898
46.542
46.636
Adj. Count R2 (%)
24.602
26.309
38.936
40.167
31.167
32.902
F1 Score
0.447
0.441
0.525
0.532
0.449
0.452
To measure the tests accuracy, we compared the F1 scores, and Ver_LKOE of the
questionnaire had the highest values, and we obtained reasonable and comparable values for
Ver_LK and Ver_OE questionnaires. Similar trends are observed for the test set; although, the
performance for Ver_LK was even lower than those for Ver_OE. Interestingly, in the test set,
the proposed model for Ver_LKOE performed significantly better than Ver_LK and Ver_OE
of the questionnaire.
When evaluating the performance of our proposed models with respect to the individual
models, it is worth noting that the proposed model performs better compared to the individual
models. For the training set, this is observed only for Ver_LKOE; however, for the test set, this
performs significantly better for all three versions of the questionnaires and is valid for the
various goodness-of-fit measures such as ρ2 value, adjusted count R2 and the F1 scores.
P a g e | 86
We compared the differences in direction (+ve/-ve) and magnitudes of coefficients between
the proposed model and individual models of the characteristics such as socio-demographics,
travel, stated-preferences, and attitudes. The corresponding number of questions for which
estimated coefficients for socio-demographics, travel and stated-preferences had the same
direction were 60 (62.5%), 69 (71.875%) and 78 (81.25%) for Ver_LK, Ver_LKOE and
Ver_OE, respectively. The number of variables for which the estimated coefficients was
significant was similar for all models, Ver_LK (72 (73.47%)), Ver_LKOE (73 (74.49%)) and
Ver_OE (70 (71.43%)). When it comes to attitudes, both proportion for the two Likert scale
versions of the questionnaire is the same. However, the statistical significance of the estimated
coefficients was higher for the proposed model (Ver_LK- 94.57 and Ver_LKOE- 92.39)
compared to the individual models (Ver_LK- 84.78 and Ver_LKOE- 78.26).
For the estimation, we consider “Regular Car” as the base alternative and estimated coefficients
for “Private AVs” and “Shared AVs”. As discussed previously, we included variables related
to the socio-demographic characteristics of the individual, travel, attitudes and the
characteristics of the stated-preference experiments as explanatory variables. To evaluate the
performance of these variables, we compared the nature of estimated coefficients with those
reported in the literature. It is interesting to note that, in general, these coefficients align with
the findings from other researchers, which is reassuring, particularly for the proposed
framework. Among the many variables discussed in greater detail in Appendix E, it is
encouraging to note that the coefficients for the stated-preference section align with the
findings of Haboucha et al. [40]- the study from which we adopted SP experiments.
4.8 PROPOSED FRAMEWORK TO MODEL ATTITUDES JOINTLY
The development of this framework is related to the fourth objective of our research. Having
estimated the models using our proposed framework, we use the estimated coefficients to allow
researchers/analysts to estimate the corresponding scores using other questionnaire types. For
instance, for a given open-ended response, what would be the corresponding response on a
Likert scale. We achieve this by using the coefficients (αLK, γLK, αLKOE, γLKOE, αOE, γOE) and
the values for the latent attitudes (Att). The Probabilistic Graphical Model for the modified
framework is illustrated in Figure 4.13, and we achieve this proposed framework using Gibbs
Sampling [213].
P a g e | 87
Figure 4.13 Probabilistic Graphical Model for the Modified Framework
The latent attitudes (yatt) are multidimensional, and so are the coefficients α and γ. Assuming
X to be the response to a Likert scale response or the topic proportions for an open-ended
response, equations for the latent attitudes can be written as: -


The values for each of these explanatory variables can be computed using the approach
described below: -
1. Initialise values for X11, X12, X13, X14, X15, … …, Xn1, Xn2, Xn3, Xn4, Xn5.
2. For i in range(iterations):




















The initial values of X11, X12, X13, X14, X15, … …, Xn1, Xn2, Xn3, Xn4, Xn5 thus estimated, can
be discarded to account for the warm-up phase. As the number of iterations increases, the
estimated values for these variables tend to converge. However, this research problem presents
additional challenges in adopting the conventional approach as the variable sets (X11, X12, X13,
P a g e | 88
X14, X15, …, Xn1, Xn2, Xn3, Xn4, Xn5) are not mutually independent. Each of these variables in
a set must sum one. To account for this, we use vectorisation in Linear Algebra and draw the
variables from a Dirichlet distribution.
The proposed approach allows respondents to choose the questionnaire type that is of interest
to them. If the existing models used by researchers/analysts utilise responses to Likert scale (or
open-ended) questions and that the respondents have answered using open-ended (or Likert
scale) responses, using this framework equips them to deduce approximate Likert scale (or
open-ended) responses which could then be used for predicting behaviour. We demonstrate
this was using Figure 4.14. In our study, attitudes are two dimensional, and we segment
respondents into two categories ("Both Negative” and “First Negative”) using the nature (sign)
of the estimated attitudes. For each dataset (Ver_LK and Ver_LKOE), we estimate the topic
proportions and obtain the word clouds for the observed scale responses.
Figure 4.14 Predictions Using the Proposed Framework
4.9 CONCLUSION
Chapter 3 investigated the potential to use Topic Modelling to extract information from the
open-ended questions used to measure attitudes in travel behaviour research. Having observed
positive results using open-ended questions in measuring attitudes, we pursued this second
study using a large and representative dataset collected from the United States of America.
Contrary to the previous study, we used three versions of the questionnaire, described in detail
in Section 4.2.1. We collected 3002 responses from the USA between January and March 2020.
P a g e | 89
Using this dataset, we pursued four research questions, and in the subsequent paragraphs, we
summarise the findings related to each of these.
The first objective investigated if the method of collecting qualitative data influences the survey
responses. In this regard, we compared the responses to the Likert scale responses common to
the two versions (Ver_LK and Ver_LKOE) of the questionnaire and the mode choice. From
the frequency distributions, one could see a clear difference in the distributions of the
responses, which upon further investigation using the Mann-Whitney U Test and Statistical
Analysis were observed to be statistically significant for most of the variables (85% for the
Mann-Whitney U test and 60% for statistical analysis). The results reiterate the findings from
our previous study carried out in India. We observed a decline in the proportion of neutral
responses among those answering the questionnaire that also included the open-ended question.
There is also an increase in the proportion of respondents choosing the extreme points on the
Likert scale. We hypothesised that the respondents answering the open-ended questions must
pause, think, and articulate their response, altering their thought process. However, we could
not observe a difference in duration for answering alternative questionnaires, which could also
be related to the method of dissemination of the questionnaires as the time taken to answer the
responses in a page might be influenced by the speed of the internet and the device used to
answer the questionnaire. Besides, the break in the thought process might have encouraged
them to respond to the Likert scales with more caution.
The second objective involved the development of a framework to extract information from
the open-ended responses. First, we cleaned the grammatical and spelling mistakes in the
responses to each of the questions using Grammarly before carrying out an exploratory analysis
and eventually using Topic Modelling approaches such as LDA and sLDA to extract
information. Next, we evaluated the extracted topics to see if they were meaningful, and using
PyLDAvis, we evaluated the inter-topic distance to see if the extracted topics were distinct. In
addition to this, we compared the extracted topics with the statements used in the Likert scale
responses.
Having extracted the information from open-ended responses, we developed the framework
that modelled the mode choice for commute trips. Using the Probabilistic Graphical Models,
this framework used attitudes as a latent variable and the other explanatory variables. Central
to the development of the framework is the notion that attitudes are latent constructs measured
using closed- and open-ended responses and were modelled simultaneously and estimated
P a g e | 90
using the probabilistic programming language Pyro. For the attitudinal variables, we included
variables used in the Technology Acceptance Model.
In this study, Ver_LK used only Likert scale questions to measure attitudes, Ver_LKOE
introduced open-ended questions before the Likert scale questions, and Ver_OE represents the
other extreme which uses only open-ended questions. Ver_LK and Ver_LKOE are similar to
questionnaires used in our study carried out in India (Chapter 3). Although the improvements
are not significant, these reiterate the conclusions from our previous study in India that the
version with open-ended responses performs better than the version with only Likert scale
responses. And we believe this could be because the open-ended questions before the Likert
scale questions encourage people to probably pause and think, which could have caused them
to think more coherently and is not because of the open-ended questions per se. The
improvements are particularly impressive for the test set (0.0314 for Ver_LK and 0.1237 for
Ver_LKOE). However, if researchers are to use only open-ended questions to measure
attitudes, the performance of the models are not at par with those that used Likert scales. It is
also worth noting that our proposed framework outperforms individual models, except for
Ver_OE in the training set. And in the test set, the proposed framework performs better than
individual models for all versions of the questionnaire- which is positive.
Our proposed framework allows respondents to choose their preferred type of question (closed-
or open-ended) to answer the questionnaire and allows researchers/analysts to be flexible. Our
proposed framework does not necessarily imply better results but could be useful when datasets
are collected with different approaches. For instance, as demonstrated, researchers could use
scores to estimate their current models, calibrated for Likert scales or open-ended responses.
While the use of Topic Modelling presents researchers with a faster and more efficient solution
to extract information from text, they should make conscious and careful decisions regarding
the various data cleaning techniques to be used. For instance, researchers should identify other
words for inclusion to the list of “Stop Words”- Autonomous, Vehicles, Cars, that do not add
additional information. Moreover, identifying and combining words to form sequences is
inevitable and executed carefully, as in the absence of this, their use might imply a different
meaning. Combining words should be carefully done carrying out preliminary analysis on such
combinations and their occurrence in the dataset, which might pose challenges to the analyst.
P a g e | 91
5 CONCLUSION
5.1 INTRODUCTION
To devise appropriate strategies for policy implementations, it is quintessential for both
researchers and policymakers to understand attitudes. Although most studies have used the
closed-ended approach to measure attitudes, closed- and open-ended approaches have been
used. However, there is still a debate on which is a more appropriate approach to measure
attitudes.
The closed-ended approach presents an approach that is relatively easier to both respondents
and analysts, as it facilitates a rapid and convenient analysis with well-established modelling
techniques. However, there are concerns regarding using a closed-ended approach, as critics
argue that the closed-ended approaches measure aspects that may be relevant to the
researcher/policymaker and not necessarily to the respondent. On the other hand, the open-
ended approach allows respondents to articulate their attitudes freely without being
constrained. However, it comes with some serious concerns regarding the processing and the
analysis of the data, as it is highly time-consuming and expensive. To extract information from
open-ended responses, we could use Topic Modelling- a recent development in Natural
Language Processing. In addition to this, the mode of asking questions (open- or closed-ended
approach) might influence the responses. Considering this, we pursued four objectives in this
study, a. to analyse if the method of collecting qualitative data influences the survey responses,
b. to develop an approach to extract open-ended responses from a survey and process the data,
c. to compare the relative performance of the open-ended and closed-ended responses in
analysing qualitative data, d. to develop a framework that measures attitudes while allowing
respondents to choose their preferred type of question (closed- or open-ended).
To accomplish the first objective, we built different versions of the questionnaires and
randomly presented them to the respondents with Likert scale questions and a combination of
Likert scales and open-ended questions. To extract information from the open-ended responses,
techniques in Topic Modelling such as Latent Dirichlet Allocation and supervised Latent
Dirichlet Allocation approaches were used. Having extracted the information from the open-
ended responses, we estimated models that used responses from the different datasets to predict
behaviour. Finally, we compare the models performance and quantify the improvements using
each of these approaches. In the first study, we used ordered Probit models, and in the second
study, we used the Probabilistic Graphical Models. In the second study, we proposed a
P a g e | 92
framework that allows analysts/researchers to model the choice irrespective of the respondents
questionnaire to answer the survey. Furthermore, using the estimated coefficients and the latent
variables for a given questionnaire type, the corresponding scores for other questionnaire types
can be generated.
5.2 DATA DESCRIPTION
This study used two datasets; the first dataset collected information on the intention to use
Shared AVs from India between November 2017 and March 2018. Two versions of the
questionnaire, Ver_Lk- only Likert scale responses and Ver_LkOE- a combination of Likert
scales and open-ended responses, were used. The alternative versions of the online
questionnaire were distributed through Facebook, WhatsApp, and mailing lists, with the help
of bloggers. After removing inconsistent responses and respondents who answered it too
quickly, the final dataset comprised 364 complete responses (Ver_Lk- 201 and Ver_LkOE-
163). When comparing the distributions of the socio-demographic characteristics of the
respondents, we observed no statistically significant differences in the characteristics of
respondents answering the two versions of the questionnaire. However, this dataset had a
higher proportion of males and young and middle-aged respondents.
To address the concerns related to the sample size and the representativeness of the dataset
collected from India, we carried out a second study measuring the intention to use AVs as a
mode for commute trips from the USA. Contrary to our previous study, we used statements for
Likert scale questions tested previously by other researchers. To analyse if the type of
questionnaire influences the responses to the Likert scale questions, we used two
questionnaires- Ver_LK and Ver_LKOE. In addition to this, to analyse if we could replace all
Likert scale questions with open-ended questions, we used the third version, Ver_OE. To
collect representative samples, we used online panels provided by Cint. After removing records
with inconsistencies, the final dataset comprised 3002 responses. Thus, the dataset was
representative of the population based on gender, ethnicity, and regional representation.
5.3 SALIENT FINDINGS
We present the salient findings from our research below: -
5.3.1 Influence of Questionnaire Type on the Responses
We evaluated if the use of open-ended questions before the set of Likert scale questions
influences the responses to these questions. We do this by comparing the frequency
P a g e | 93
distributions of responses to these questions between Ver_LK and Ver_LKOE of the
questionnaire. Later, we performed the non-parametric tests (Mann-Whitney U test) and
statistical analysis to analyse this influence and narrow it down to the influence of questionnaire
type. In both datasets, we observed a difference in the distributions of the responses to the
Likert scales, which is statistically significant. In general, we see that respondents answering
Ver_LKOE of the questionnaire had a higher positive attitude towards AVs. However, the most
important aspect related to the introduction of these open-ended questions was the reduction in
the number of neutral responses- respondents are less conformist. From the perspective of an
analyst/researcher in survey design, the results highlight an improvement in the models
performance built from the collected surveys. They are particularly relevant as one need not go
all the way to implementing complex approaches such as Topic Models, thus offering relevant
guidance for improvements that are pretty easy to implement.
5.3.2 Approach to Extract Information from Open-ended Responses
We used Topic Modelling approaches such as Latent Dirichlet Allocation and supervised
Latent Dirichlet Allocation methods to extract information from the open-ended responses. We
cleaned the responses by correcting them for spelling and grammatical errors. In addition to
this, we performed the standard text processing approaches such as removing Stop Words,
lemmatisation and formation of compound words. Having processed the data, we estimated
Topic Models and the extracted topics were evaluated for their meaning and were analysed to
evaluate if they were distinct. We then compared the extracted topics with the ideas discussed
in the statements used in the Likert scales. We observed correspondence between the topics
and the statements used for the Likert scale responses for both datasets, indicating the
suitability for using open-ended questions to measure attitudes and Topic Modelling to extract
information from open-ended responses. Results from our study from India highlights the need
for the careful design of the open-ended questions.
5.3.3 Evaluate the Relative Performance of Closed- and Open-ended Approaches
One of the main objectives of this thesis was to evaluate the performance of a questionnaire
that also used open-ended responses with a questionnaire that used the Likert scale responses.
In the first study that used the dataset from India, we compared the performance of a dataset
that used Likert scale responses with a questionnaire that used a combination of Likert scales
and open-ended responses. The version that used a combination of Likert scales and open-
ended responses performed better than the version that used only Likert scales, and the
P a g e | 94
estimated coefficients were also more meaningful, as open-ended questions before the Likert
scale responses make individuals pause, think, and answer more coherently. In this study, the
topic extracted from the open-ended responses was also statistically significant, while that for
the Likert scale response was not.
In the second study, we collected three datasets (for each questionnaire type) and compared
models performance. As was observed in the dataset collected from India, the model estimated
for the dataset that used a combination of Likert scales and open-ended questions performed
better than the dataset that used only Likert scales. However, the models estimated for the
dataset that used only open-ended responses did not perform as good as the other models. But,
the models for the dataset that used open-ended responses (as warming up questions) and Likert
scales performed better than the dataset that used only Likert scales for the test set.
5.3.4 A framework to Measure Attitudes that Allow Respondents to Choose Their Preferred
Questionnaire Type
Using the dataset collected from the USA, we also proposed a framework that estimated a
combined model to predict the intention to use AVs for commute trips. The model assumes
attitudes to be latent constructs and closed- or open-ended approaches to be the different
instruments used to measure attitudes. Using this approach allows researchers/analysts to be
flexible with the data collection approach yet use the estimation method convenient.
5.4 LIMITATIONS OF THE CURRENT STUDY AND DIRECTIONS FOR FUTURE
RESEARCH
We must highlight that while the use of Topic Modelling significantly reduces the time for the
extraction of topics, one should not overlook the importance of data pre-processing for the text
analysis. This warrants due diligence as researchers should strike a balance between data
cleaning as it is time-consuming and may sometimes remove the inherent structure of the
response. Furthermore, based on our experience from the two datasets, each question might
demand the use of “Stop Words” that are question-specific and probably data-specific (should
be explored with more datasets), which is also the case with combining words.
Another challenge associated with the study; is that we could not find improvements using
open-ended questions. Analysing these responses might be influenced by data cleaning
strategies. And analysing such responses with the help of various theories in linguistics might
be an avenue for further research. This is particularly in light of the findings of other researchers
P a g e | 95
regarding the use of Artificial Intelligence, as its use without anchoring on theories might
reiterate the aphorism “Garbage in Garbage out” [214].
Furthermore, to frame appropriate questions and improve the quality of the responses to the
open-ended questions, it would be interesting to carry out in-depth interviews. These surveys
also facilitate the development of appropriate strategies for cleaning the responses and is
particularly relevant because we believe that the models performance would have been
affected by framing the questions and the quality of the answers.
This thesis used Topic Modelling to extract information from open-ended responses, for which
we used Natural Language Toolkit (NLTK). However, the languages supported in NLTK are
limited (23), which could pose difficulties for extracting information from open-ended
responses for surveys conducted in languages not supported by NLTK.
The results (backed by both studies) highlight the potential for open-ended questions to alter
the responses to closed-ended responses- reduce neutral responses while making respondents
more decisive. And it is therefore interesting to evaluate the influence of open-ended questions
in revealed preference surveys in travel behaviour research and other domains.
One of the main issues related to open-ended questionnaires is the difficulties in writing/typing
as it is often burdensome to the respondents, which can be addressed by allowing respondents
to speak. In addition, to process the responses, researchers/analysts could use various speech
recognition techniques. It would be interesting to evaluate how combining these will be
effective in measuring attitudes.
P a g e | 96
P a g e | 97
REFERENCES
[1] P. Jones, “The Role of an Evolving Paradigm in Shaping International Transport Research and Policy
Agendas over the Last 50 Years,” Proc. XII Int. Assoc. Travel Behav. Res. Conf., vol. 3, p. 34, 2009.
[2] P. Jones, “The Evolution of Urban Mobility: The Interplay of Academic and Policy Perspectives,” IATSS
Res., vol. 38, no. 1, pp. 713, 2014.
[3] S. Kaplan, F. Manca, T. A. S. Nielsen, and C. G. Prato, “Intentions to Use Bike-Sharing for Holiday
Cycling: An Application of the Theory of Planned Behavior,” Tour. Manag., vol. 47, pp. 3446, 2015.
[4] R. Kelkel, “Predicting Consumers’ Intention-to-purchase Fully Autonomous Driving Systems Which
Factors Drive Acceptance?,” Universidade Católica Portuguesa, 2015.
[5] A. Mehdizadeh, S. Kaplan, J. De Abreu, O. Anker, and F. Camara, “Use Intention of Mobility-
management Travel Apps: The Role of Users Goals, Technophile Attitude and Community Trust,”
Transp. Res. Part A, vol. 126, no. May, pp. 114135, 2019.
[6] S. Kaplan, J. de Abreu e Silva, and F. di Ciommo, “The Relationship Between Young People’s Transit
Use and Their Perceptions of Equity Concepts in Transit Service Provision,” Transp. Policy, vol. 36, pp.
7987, 2014.
[7] K. K. Srinivasan and P. Bhargavi, “Longer-term Changes in Mode Choice Decisions in Chennai: A
Comparison between Cross-sectional and Dynamic Models,” Transportation (Amst)., vol. 34, no. 3, pp.
355374, 2007.
[8] S. Sadhukhan, U. K. Banerjee, and B. Maitra, “Commuters’ Perception towards Transfer Facility
Attributes in and Around Metro Stations : Experience in Kolkata,” J. Urban Plan. Dev., vol. 141, no. 4,
pp. 18, 2015.
[9] F. Zhao et al., “Exploratory Analysis of a Smartphone-Based Travel Survey in Singapore,” Transp. Res.
Rec. J. Transp. Res. Board, vol. 2, no. March, pp. 4556, 2015.
[10] R. Likert, “A Technique for the Measurement of Attitudes,” Arch. Psychol., vol. 22, no. 140, p. 55, 1932.
[11] J. S. Plant, “Rating Scheme for Conduct,” Am. J. Psychiatry, vol. 78, no. 4, pp. 547572, 1922.
[12] M. Freyd, “The Graphic Rating Scale,” J. Educ. Psychol., vol. 14, no. 2, pp. 83102, 1923.
[13] D. Rugg and H. Cantril, “The Wording of Questions in Public Opinion Polls,” J. Abnorm. Soc. Psychol.,
vol. 37, no. 4, pp. 469495, 1942.
[14] J. G. Geer, “What do Open-ended Questions Measure?,” Am. Assoc. Public Opin. Res., vol. 52, no. 3, pp.
365371, 1988.
[15] J. G. Geer, “Do Open-Ended Questions Measure ‘Salient’ Issues?,” Public Opin. Q., vol. 55, no. 3, p. 360,
1991.
[16] P. Bansal and K. M. Kockelman, “Are We Ready to Embrace Connected and Self-driving Vehicles? A
Case Study of Texans,” Transportation (Amst)., vol. 45, no. 2, pp. 641675, 2018.
[17] P. F. Lazarsfeld, “The Controversy over Detailed Interviews— An Offer for Negotiation,” Public Opin.
Q., vol. 8, no. 1, pp. 3860, 1944.
[18] J. M. Converse, “Strong Arguments and Weak Evidence: The Open/closed Questioning Controversy of
the 1940s,” Public Opin. Q., vol. 48, no. 1B, pp. 267282, 1984.
[19] J. A. Krosnick, “Questionnaire Design,” in The Palgrave Handbook of Survey Research, D. L. Vannette
and J. A. Krosnick, Eds. Palgrave Macmillan, 2018, pp. 439455.
[20] T. M. Ostrom and K. M. Gannon, “Exemplar Generation: Assessing How Respondents Give Meaning to
Rating Scales,in Answering Questions: Methodology for Determining Cognitive and Communicative
Processes in Survey Research, N. Schwarz and S. Sudman, Eds. San Francisco, CA, 1996, pp. 293318.
[21] J. A. Krosnick, “Response Strategies for Coping with the Cognitive Demands of Attitude Measures in
Surveys,” Appl. Cogn. Psychol., vol. 5, no. 3, pp. 213236, 1991.
[22] S. L. Becker, “Why an Order Effect,” Public Opin. Q., vol. 18, no. 3, pp. 271278, 1954.
[23] L. J. Cronbach, “Response Sets and Test Validity,” Educ. Psychol. Meas., vol. 4, no. 6, pp. 475494,
P a g e | 98
1946.
[24] J. A. Krosnick and S. Presser, “Question and Questionnaire Design,” in Handbook of Survey Research,
2nd ed., vol. 112, no. 3, Emerald Group Publishing Limited, 2010, pp. 263313.
[25] S. Iyengar, “Framing Responsibility for Political Issues,” Ann. Am. Acad. Polit. Soc. Sci., vol. 546, no. 1,
pp. 5970, 1996.
[26] W. L. Neuman, Social Research Methods: Qualitative and Quantitative Approaches, Seventh Ed. Pearson
Higher Education, 2013.
[27] I. Ajzen, “The Theory of Planned Behavior,” Organ. Behav. Hum. Decis. Process., vol. 50, no. 2, pp.
179211, 1991.
[28] F. D. Davis, Jr, “A Technology Acceptance Model for Empirically Testing New End-User Information
Systems: Theory and Results,” Massachusetts Institute of Technology, 1985.
[29] V. Venkatesh, M. G. Morris, G. B. Davis, and F. D. Davis, Jr, “User Acceptance of Information
Technology: Toward a Unified View,” MIS Q., vol. 27, no. 3, pp. 425478, 2003.
[30] G. Carrus, P. Passafaro, and M. Bonnes, “Emotions, Habits and Rational Choices in Ecological
Behaviours: The Case of Recycling and Use of Public Transportation,” J. Environ. Psychol., vol. 28, no.
1, pp. 5162, 2008.
[31] I. J. Donald, S. R. Cooper, and S. M. Conchie, “An Extended Theory of Planned Behaviour Model of the
Psychological Factors Affecting Commuters’ Transport Mode Use,” J. Environ. Psychol., vol. 40, pp. 39
48, 2014.
[32] Ö. Şimşekoğlu, T. Nordfjærn, and T. Rundmo, “The Role of Attitudes, Transport Priorities, and Car Use
Habit for Travel Mode Use and Intentions to Use Public Transportation in an Urban Norwegian Public,”
Transp. Policy, vol. 42, pp. 113120, 2015.
[33] S. Zailani, M. Iranmanesh, T. A. Masron, and T.-H. Chan, “Is the Intention to Use Public Transport for
Different Travel Purposes Determined by Different Factors?,” Transp. Res. Part D Transp. Environ., vol.
49, pp. 1824, 2016.
[34] D. Lois, J. A. Moriano, and G. Rondinella, “Cycle Commuting Intention: A Model Based on Theory of
Planned Behaviour and Social Identity,” Transp. Res. Part F Psychol. Behav., vol. 32, pp. 101113, 2015.
[35] Á. Fernández-Heredia, S. Jara-Díaz, and A. Monzón, “Modelling Bicycle Use Intention: The Role of
Perceptions,” Transportation (Amst)., vol. 43, pp. 123, 2016.
[36] S. Kaplan, D. K. Wrzesinska, and C. G. Prato, “The Role of Human Needs in the Intention to Use
Conventional and Electric Bicycle Sharing in a Driving-oriented Country,” Transp. Policy, vol. 71, pp.
138146, 2018.
[37] A. A. De Souza, S. P. Sanches, and M. A. G. Ferreira, “Influence of Attitudes with Respect to Cycling on
the Perception of Existing Barriers for Using this Mode of Transport for Commuting,” Procedia - Soc.
Behav. Sci., vol. 162, pp. 111120, 2014.
[38] W. Payre, J. Cestac, and P. Delhomme, “Intention-to-use a Fully Automated Car: Attitudes and a priori
Acceptability,” Transp. Res. Part F Traffic Psychol. Behav., vol. 27, no. PB, pp. 252263, 2014.
[39] M. Kyriakidis, R. Happee, and J. C. F. de Winter, “Public Opinion on Automated Driving: Results of an
International Questionnaire Among 5000 Respondents,” Transp. Res. Part F Traffic Psychol. Behav., vol.
32, pp. 127140, 2015.
[40] C. J. Haboucha, R. Ishaq, and Y. Shiftan, “User Preferences Regarding Autonomous Vehicles,” Transp.
Res. Part C Emerg. Technol., vol. 78, pp. 3749, 2017.
[41] M. König and L. Neumayr, “Users’ Resistance Towards Radical Innovations: The Case of the Self-driving
Car,” Transp. Res. Part F Traffic Psychol. Behav., vol. 44, pp. 4252, 2017.
[42] T. A. S. Nielsen and S. Haustein, “On Sceptics and Enthusiasts: What are the Expectations Towards Self-
driving Cars?,” Transp. Policy, vol. 66, pp. 4955, 2018.
[43] C. Hohenberger, M. Spörrle, and I. M. Welpe, “How and Why do Men and Women Differ in Their
Willingness to Use Automated Cars? The Influence of Emotions Across Different Age Groups,” Transp.
Res. Part A, vol. 94, pp. 374385, 2016.
P a g e | 99
[44] S. T. D. Cordazzo, C. T. Scialfa, K. Bubric, and R. J. Ross, “The Driver Behaviour Questionnaire: A
North American Analysis,” J. Safety Res., vol. 50, pp. 99107, 2014.
[45] H. Iversen, “Risk-taking Attitudes and Risky Driving Behaviour,” Transp. Res. Part F Traffic Psychol.
Behav., vol. 7, no. 3, pp. 135150, 2004.
[46] T. Nordfjærn, S. H. Jørgensen, and T. Rundmo, “An Investigation of Driver Attitudes and Behaviour in
Rural and Urban Areas in Norway,” Saf. Sci., vol. 48, no. 3, pp. 348356, 2010.
[47] J. A. Krosnick and L. R. Fabrigar, “Designing Rating Scales for Effective Measurement in Surveys,” in
Survey Measurement and Process Quality, L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N.
Schwarz, and D. Trewin, Eds. New York: John Wiley & Sons, 1997, pp. 141164.
[48] A. Bowling, “Handbook of Health Research Methods,” in Handbook of Health Research Methods-
Investigation, Measurement and Analysis, A. Bowling and S. Ebrahim, Eds. Berkshire: Open University
Press, 2005, pp. 394427.
[49] H. Schuman, J. Ludwig, and J. A. Krosnick, “The Perceived Threat of Nuclear War, Salience and Open
Questions,” Public Opin. quarterlyPublic Opin. Q., vol. 50, no. 4, pp. 519536, 1986.
[50] H. Schuman and J. Scott, “Problems in the Use of Survey Questions to Measure Public Opinion,” Science
(80-. )., vol. 236, no. 4804, pp. 957959, 1987.
[51] W. R. Johnson, N. A. Sieveking, and E. S. Clanton, “Effects of Alternative Positioning of Open-ended
Questions in Multiple-choice Questionnaires,” J. Appl. Psychol., vol. 59, no. 6, pp. 776778, 1974.
[52] H. Schuman and S. Presser, “The Open and Closed Question,” Am. Sociol. Rev., vol. 44, no. 5, pp. 692
712, 1979.
[53] O. Friborg and J. H. Rosenvinge, “A Comparison of Open-ended and Closed Questions in the Prediction
of Mental Health,” Qual. Quant., vol. 47, no. 3, pp. 13971411, 2013.
[54] P. M. Symonds, “On the Loss of Reliability in Ratings due to Coarseness of the Scale,” J. Exp. Psychol.,
vol. 7, no. 6, pp. 456461, 1924.
[55] D. Peabody, “Two Components in Bipolar Scales: Direction and Extremeness,” Psychol. Rev., vol. 69,
no. 2, pp. 6573, 1962.
[56] H. Champney and H. Marshall, “Optimal Refinement of the Rating Scale,” J. Appl. Psychol., vol. 23, no.
3, pp. 323331, 1939.
[57] S. S. Komorita and W. K. Graham, “Number of Scale Points and the Reliability of Scales,” Educ. Psychol.
Meas., vol. 25, no. 4, pp. 987995, 1965.
[58] N. Birkett, “Selecting the Number of Response Categories for a Likert-type Scale,” J. Am. Stat. Assoc.,
pp. 488492, 1986.
[59] C. C. Preston and A. M. Colman, “Optimal Number of Response Categories in Rating Scales: Reliability,
Validity, Discriminating Power and Respondent Preferences,” Acta Psychol. (Amst)., vol. 104, no. 1, pp.
115, 2000.
[60] J. H. Flaskerud, “Cultural Bias and Likert-Type Scales Revisited,” Issues Ment. Health Nurs., vol. 33, no.
2, pp. 130132, 2012.
[61] L. A. King, D. W. King, and A. J. Klockars, “Dichotomous and Multipoint Scales Using Bipolar
Adjectives,” Appl. Psychol. Meas., vol. 7, no. 2, pp. 173180, 1983.
[62] C. Capik and S. Gozum, “Psychometric Features of an Assessment Instrument with Likert and
Dichotomous Response Formats,” Public Health Nurs., vol. 32, no. 1, pp. 8186, 2015.
[63] Y. S. Chung and Y. C. Chiou, “Willingness-to-pay for a Bus Fare Reform: A Contingent Valuation
Approach with Multiple Bound Dichotomous Choices,” Transp. Res. Part A Policy Pract., vol. 95, pp.
289304, 2017.
[64] S. A. Useche, V. G. Ortiz, and B. E. Cendales, “Stress-related Psychosocial Factors at Work, Fatigue, and
Risky Driving Behavior in Bus Rapid Transport (BRT) Drivers,” Accid. Anal. Prev., vol. 104, pp. 106
114, 2017.
[65] Y. Nishihori, J. Yang, R. Ando, and T. Morikawa, “Understanding Social Acceptability of Drivers for the
Diffusion of Autonomous Vehicles in Japan,” J. East. Asia Soc. Transp. Stud., vol. 12, pp. 21022116,
P a g e | 100
2017.
[66] P. Bansal, K. M. Kockelman, and A. Singh, “Assessing Public Opinions of and Interest in New Vehicle
Technologies: An Austin Perspective,” Transp. Res. Part C Emerg. Technol., vol. 67, pp. 114, 2016.
[67] S. Ramisetty-Mikler and A. Almakadma, “Attitudes and Behaviors Towards Risky Driving Among
Adolescents in Saudi Arabia,” Int. J. Pediatr. Adolesc. Med., vol. 3, no. 2, pp. 5563, 2016.
[68] R. Shabanpour, S. N. D. Mousavi, N. Golshani, J. Auld, and A. Mohammadian, “Consumer Preferences
of Electric and Automated Vehicles,” in 5th IEEE International Conference on Models and Technologies
for Intelligent Transportation Systems, MT-ITS 2017, 2017, pp. 716720.
[69] M. Diana, “Measuring the Satisfaction of Multimodal Travelers for Local Transit Services in Different
Urban Contexts,” Transp. Res. Part A Policy Pract., vol. 46, no. 1, pp. 111, 2012.
[70] K. Shaaban and R. F. Khalil, “Investigating the Customer Satisfaction of the Bus Service in Qatar,”
Procedia - Soc. Behav. Sci., vol. 104, pp. 865874, 2013.
[71] B. Schoettle and M. Sivak, “A Survey of Public Opinion About Autonomous and Self-Driving Vehicles
in the US, the UK and Australia,” Michigan, 2014.
[72] B. Schoettle and M. Sivak, “Public Opinion about Self-Driving Vehicles in China, India, Japan, the U.S.,
the U.K. and Australia,” Michigan, 2014.
[73] J. P. Zmud and I. N. Sener, “Towards an Understanding of the Travel Behavior Impact of Autonomous
Vehicles,” Transp. Res. Procedia, vol. 25, pp. 25002519, 2017.
[74] I. N. Sener, J. Zmud, and T. Williams, “Measures of Baseline Intent to Use Automated Vehicles: A Case
Study of Texas Cities,” Transp. Res. Part F Traffic Psychol. Behav., vol. 62, pp. 6677, 2019.
[75] D. M. Sanbonmatsu, D. L. Strayer, Z. Yu, F. Biondi, and J. M. Cooper, “Cognitive Underpinnings of
Beliefs and Confidence in Beliefs About Fully Automated Vehicles,” Transp. Res. Part F Traffic Psychol.
Behav., vol. 55, pp. 114122, 2018.
[76] W. Qu, Q. Zhang, W. Zhao, K. Zhang, and Y. Ge, “Validation of the Driver Stress Inventory in China:
Relationship with Dangerous Driving Behaviors,” Accid. Anal. Prev., vol. 87, pp. 5058, 2016.
[77] B. Öz, T. Özkan, and T. Lajunen, “An Investigation of Professional Drivers: Organizational Safety
Climate, Driver Behaviours and Performance,” Transp. Res. Part F Traffic Psychol. Behav., vol. 16, pp.
8191, 2013.
[78] K. Amponsah-Tawiah and J. Mensah, “The Impact of Safety Climate on Safety Related Driving
Behaviors,” Transp. Res. Part F Traffic Psychol. Behav., vol. 40, pp. 4855, 2016.
[79] S. Classen, A. L. Nichols, R. McPeek, and J. F. Breiner, “Personality as a Predictor of Driving
Performance: An Exploratory Study,” Transp. Res. Part F Traffic Psychol. Behav., vol. 14, pp. 381389,
2011.
[80] C. Domarchi, A. Tudela, and A. González, “Effect of Attitudes, Habit and Affective Appraisal on Mode
Choice: An Application to University Workers,” Transportation (Amst)., vol. 35, no. 5, pp. 585599,
2008.
[81] C.-F. Chen, “Personality, Safety Attitudes and Risky Driving Behaviors-Evidence from Young Taiwanese
Motorcyclists,” Accid. Anal. Prev., vol. 41, no. 5, pp. 963968, 2009.
[82] R. F. Abenoza, O. Cats, and Y. O. Susilo, “Travel Satisfaction with Public Transport: Determinants, User
Classes, Regional Disparities and Their Evolution,” Transp. Res. Part A Policy Pract., vol. 95, pp. 64
84, 2017.
[83] M. Mohamed and N. F. Bromfield, “Attitudes, Driving Behavior, and Accident Involvement Among
Young Male Drivers in Saudi Arabia,” Transp. Res. Part F Traffic Psychol. Behav., vol. 47, pp. 5971,
2017.
[84] M. Milković and M. Štambuk, “To Bike or not to Bike? Application of the Theory of Planned Behavior
in Predicting Bicycle Commuting Among Students in Zagreb,” Psihol. teme, vol. 2, pp. 187205, 2015.
[85] T. Liljamo, H. Liimatainen, and M. Pöllänen, “Attitudes and Concerns on Automated Vehicles,” Transp.
Res. Part F Traffic Psychol. Behav., vol. 59, no. 2018, pp. 2444, 2018.
[86] R. Chomeya, “Quality of Psychology Test Between Likert Scale 5 and 6 Points,” J. Soc. Sci., vol. 6, no.
P a g e | 101
3, pp. 399403, 2010.
[87] S. S. Komorita, “Attitude Content, Intensity and the Neutral Point on a Likert Scale,” J. Soc. Psychol.,
vol. 61, no. 2, pp. 327334, 1963.
[88] S. Nordhoff, J. de Winter, M. Kyriakidis, B. van Arem, and R. Happee, “Acceptance of Driverless
Vehicles: Results from a Large Cross-National Questionnaire Study,” J. Adv. Transp., vol. 2018, pp. 1
22, 2018.
[89] G. A. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for
Processing Information,” Psychol. Rev., vol. 63, no. 2, pp. 8197, 1956.
[90] E. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The Measurement of Meaning. University of Illinois
Press, 1957.
[91] D. G. Morrison, “Regressions with Discrete Dependent Variables: The Effect on R2,” J. Mark. Res., vol.
9, no. 3, pp. 338340, 1972.
[92] J. O. Ramsay, “The Effect of Number of Categories in Rating Scales on Precision of Estimation of Scale
Values,” Psychometrika, vol. 38, no. 4, pp. 513532, 1973.
[93] K. Finstad, “Response Interpolation and Scale Sensitivity: Evidence Against 5-Point Scales,” J. Usability
Stud., vol. 5, no. 3, pp. 104110, 2010.
[94] S. O. Leung, “A Comparison of Psychometric Properties and Normality in 4-, 5-, 6-, and 11-Point Likert
Scales,” J. Soc. Serv. Res., vol. 37, no. 4, pp. 412421, 2011.
[95] L. M. Hulse, H. Xie, and E. R. Galea, “Perceptions of Autonomous Vehicles: Relationships with Road
Users, Risk, Gender and Age,” Saf. Sci., vol. 102, pp. 113, 2018.
[96] L. Buckley, S. A. Kaye, and A. K. Pradhan, “Psychosocial Factors Associated with Intended Use of
Automated Vehicles: A Simulated Driving Study,” Accid. Anal. Prev., vol. 115, pp. 202208, 2018.
[97] A. W. Bendig, “The Reliability of Self-Ratings as a Function of the Amount of Verbal Anchoring and of
the Number of Categories on the Scale,” J. Appl. Psychol., vol. 37, no. 1, pp. 3841, 1953.
[98] A. Mouwen, “Drivers of Customer Satisfaction with Public Transport Services,” Transp. Res. Part A
Policy Pract., vol. 78, pp. 120, 2015.
[99] B. Öz, T. Özkan, and T. Lajunen, “Professional and Non-Professional Drivers’ Stress Reactions and Risky
Driving,” Transp. Res. Part F Traffic Psychol. Behav., vol. 13, no. 1, pp. 3240, 2010.
[100] A. W. Bendig, “Reliability and the Number of Rating-Scale Categories,” J. Appl. Psychol., vol. 38, no. 1,
pp. 3840, 1954.
[101] A. W. Bendig and J. B. Hughes II, “Effect of Amount of Verbal Anchoring and Number of Rating-Scale
Categories Upon Transmitted Information,” J. Exp. Psychol., vol. 46, no. 2, pp. 8790, 1953.
[102] R. W. Lissitz and S. B. Green, “Effect of the Number of Scale Points on Reliability: A Monte Carlo
Approach,” J. Appl. Psychol., vol. 60, no. 1, pp. 1013, 1975.
[103] M. S. Matell and J. Jacoby, “Is There an Optimal Number of Alternatives for Likert Scale Items? Study
1: Reliability and Validity,” Educ. Psychol. Meas., vol. 31, no. 3, pp. 657674, 1971.
[104] M. S. Matell and J. Jacoby, “Is There an Optimal Number of Alternatives for Likert-Scale Items? Effects
of Testing Time and Scale Properties,” J. Appl. Psychol., vol. 56, no. 6, pp. 506509, 1972.
[105] R. Garland, “The Mid-Point on a Rating Scale: Is it Desirable?,” Mark. Bull., vol. 2, pp. 6670, 1991.
[106] K. A. Lormore and S. D. G. Stephens, “Use of the Open-ended Questionnaire with Patients and Their
Significant Others,” Br. J. Audiol., vol. 28, no. 2, pp. 8189, 1994.
[107] V. M. Esses and G. R. Maio, “Expanding the Assessment of Attitude Components and Structure: The
Benefits of Open-Ended Measures,” Eur. Rev. Soc. Psychol., vol. 12, no. 1, pp. 71101, 2005.
[108] M. Galesic, R. Tourangeau, M. P. Couper, and F. G. Conrad, “Eye-tracking Data: New Insights on
Response Order Effects and Other Cognitive Shortcuts in Survey Responding,” Public Opin. Q., vol. 72,
no. 5, pp. 892913, 2008.
[109] C. Lammgård and D. Andersson, “Environmental Considerations and Trade-offs in Purchasing of
Transportation Dervices,” Res. Transp. Bus. Manag., vol. 10, pp. 4552, 2014.
P a g e | 102
[110] J. A. Nelson, R. M. Bustamante, E. D. Wilson, and A. J. Onwuegbuzie, “The School-Wide Cultural
Competence Observation Checklist for School Counselors: An Exploratory Factor Analysis,” Prof. Sch.
Couns., vol. 11, no. 4, pp. 207217, 2008.
[111] T. Levett-Jones et al., “The Development and Psychometric Testing of the Satisfaction with Simulation
Experience Scale,” Nurse Educ. Today, vol. 31, no. 7, pp. 705710, 2011.
[112] B. Renault, J. Agumba, and N. Ansary, “An Exploratory Factor Analysis of Risk Management Practices:
A Study Among Small and Medium Contractors in Gauteng,” Acta Structilia, vol. 25, no. 1, pp. 139,
2018.
[113] J. Sun, A. E. Adegbosin, V. Reher, G. Rehbein, and J. Evans, “Validity and Reliability of a Self-
assessment Scale for Dental and Oral Health Student’s Perception of Transferable Skills in Australia,”
Eur. J. Dent. Educ., vol. 24, no. 1, pp. 4252, 2020.
[114] J. Bláfoss Ingvardson, S. Kaplan, J. de Abreu e Silva, F. di Ciommo, Y. Shiftan, and O. A. Nielsen,
“Existence, Relatedness and Growth Needs as Mediators Between Mode Choice and Travel Satisfaction:
Evidence from Denmark,” Transportation (Amst)., vol. 47, no. 1, pp. 337358, 2020.
[115] R. Zhang, “Understanding Customers’ Attitude and Intention to Use Driverless Cars,” Northumbria
University, Newcastle, 2019.
[116] R. Dubey, A. Gunasekaran, S. J. Childe, S. Fosso Wamba, and T. Papadopoulos, “Enablers of Six Sigma:
Contextual Framework and its Empirical Validation,” Total Qual. Manag. Bus. Excell., vol. 27, no. 11
12, pp. 13461372, 2016.
[117] A. Cyrus and P. S. Nyakomitta, “Adoption of Computer based Model for Monitoring Parking Revenue
Inflow,” SIJ Trans. Comput. Sci. Eng. its Appl., vol. 2, no. 5, pp. 195201, 2014.
[118] B. Elizabeth, N. A. Busch-Rossnagel, and K. F. Geisinger, “Development and Preliminary Validation of
the Ego Identity Process Questionnaire,” J. Adolesc., vol. 18, pp. 179192, 1995.
[119] U. Reja, K. L. Manfreda, H. Valentina, and V. Vehovar, “Open-ended vs. Close-ended Questions in Web
Questionnaires,” Dev. Appl. Stat., vol. 19, pp. 159177, 2003.
[120] M. Pullman, K. McGuire, and C. Cleveland, “Let Me Count the Words- Quantifying Open-Ended
Interactions with Guests,” Cornell Hosp. Q., vol. 46, no. 3, pp. 323343, 2005.
[121] D. E. RePass, “Issue Salience and Party Choice,” Am. Polit. Sci. Rev., vol. 65, no. 2, pp. 389400, 1971.
[122] R. Likert, “The Polls: Straw Votes or Scientific Instruments,” Am. Psychol., vol. 3, no. 12, pp. 556557,
1948.
[123] H. Schuman, “The Random Probe: A Technique for Evaluating the Validity of Closed Questions,” Am.
Sociol. Rev., vol. 31, no. 2, pp. 218222, 1966.
[124] L. E. Griffith, D. J. Cook, G. H. Guyatt, and C. A. Charles, “Comparison of Open and Closed
Questionnaire Formats in Obtaining Demographic Information From Canadian General Internists,” J.
Clin. Epidemiol., vol. 52, no. 10, pp. 9971005, 1999.
[125] J. E. Stanga and J. F. Sheffield, “The Myth of Zero Partisanship: Attitudes toward American Political
Parties, 1964-84,” Am. J. Pol. Sci., vol. 31, no. 4, pp. 829855, 1987.
[126] P. Gendall, H. Menelaou, and M. Brennan, “Open-ended Questions: Some Implications for Mail Survey
Research,” Mark. Bull., vol. 7, pp. 18, 1996.
[127] M. P. Couper, M. W. Traugott, and M. J. Lamias, “Web Survey Design and Administration,” Public Opin.
Q., vol. 65, no. 2, pp. 230253, 2002.
[128] J. D. Smyth, D. A. Dillman, L. M. Christian, and M. Mcbride, “Open-ended Questions in Web Surveys
Can Increasing the Size of Answer Boxes and Providing Extra Verbal Instructions Improve Response
Quality?,” Public Opin. Q., vol. 73, no. 2, pp. 325337, 2009.
[129] J. L. Holland and L. M. Christian, “The Influence of Topic Interest and Interactive Probing on Responses
to Open-ended Questions in Web Surveys,” Soc. Sci. Comput. Rev., vol. 27, no. 2, pp. 196212, 2009.
[130] K. Schmidt, T. Gummer, and J. Roßmann, “Effects of Respondent and Survey Characteristics on the
Response Quality of an Open-ended Attitude Question in Web Surveys,” Methods, Data, Anal., vol. 14,
no. 1, pp. 334, 2020.
P a g e | 103
[131] A. Peytchev, “Survey Breakoff,” Public Opin. Q., vol. 73, no. 1, pp. 7497, 2009.
[132] C. Zuell, N. Menold, and S. Körber, “The Influence of the Answer Box Size on Item Nonresponse to
Open-Ended Questions in a Web Survey,” Soc. Sci. Comput. Rev., vol. 33, no. 1, pp. 115122, 2015.
[133] S. D. Crawford and M. J. Lamias, “Web Surveys Perceptions of Burden,” Soc. Sci. Comput. Rev., vol. 19,
no. 2, pp. 146162, 2001.
[134] A. Mavletova, “Data Quality in PC and Mobile Web Surveys,” Soc. Sci. Comput. Rev., vol. 31, no. 6, pp.
725743, 2013.
[135] S. Schlosser and A. Mays, “Mobile and Dirty: Does Using Mobile Devices Affect the Data Quality and
the Response Process of Online Surveys?,” Soc. Sci. Comput. Rev., vol. 36, no. 2, pp. 212230, 2018.
[136] M. Revilla and C. Ochoa, “Open Narrative Questions in PC and Smartphones: Is the Device Playing a
Role?,” Qual. Quant., vol. 50, no. 6, pp. 24952513, 2016.
[137] S. H. Lo, G. J. P. van Breukelen, G.-J. Y. Peters, and G. Kok, “Pro-environmental Travel Behavior Among
Office Workers: A Qualitative Study of Individual and Organizational Determinants,” Transp. Res. Part
A Policy Pract., vol. 56, pp. 1122, 2013.
[138] F. M. Leiva, F. J. M. Ríos, and T. L. Martínez, “Assessment of Interjudge Reliability in the Open-ended
Questions Coding Process,” Qual. Quant., vol. 40, no. 4, pp. 519537, 2006.
[139] L. J. Barcham and S. D. G. Stephens, “The Use of an Open-ended Problems Questionnaire in Auditory
Rehabilitation,” Br. J. Audiol., vol. 14, no. 2, pp. 4954, 1980.
[140] R. Artstein and M. Poesio, “Inter-Coder Agreement for Computational Linguistics,” Comput. Linguist.,
vol. 34, no. 4, pp. 555596, 2009.
[141] T. Niedomysl and B. Malmberg, “Do Open-ended Survey Questions on Migration Motives Create Coder
Variability Problems?,” Popul. Space Place, vol. 15, pp. 7987, 2009.
[142] C. L. Covell, S. Sidani, and J. A. Ritchie, “Does the Sequence of Data Collection Influence Participants’
Responses to Closed and Open-ended Questions? A Methodological Study,” Int. J. Nurs. Stud., vol. 49,
no. 6, pp. 664671, 2012.
[143] A. M. Falthzik and S. J. Carroll Jr, “Rate of Return for Closed Versus Open-ended Questions in a Mail
Questionnaire Survey of Industrial Organizations,” Psychol. Rep., vol. 29, no. 3, pp. 11211122, 1971.
[144] K. A. Jehn, “A Multimethod Examination of the Benefits and Detriments of Intragroup Conflict,” Adm.
Sci. Q., vol. 40, no. 2, pp. 256282, 1995.
[145] K. M. Jackson and W. M. K. Trochim, “Concept Mapping as an Alternative Approach for the Analysis of
Open-Ended Survey Responses,” Organ. Res. Methods, vol. 5, no. 4, pp. 307336, 2002.
[146] M. Schmidt, “Quantification of Transcripts from Depth Interviews, Open-ended Responses and Focus
Groups: Challenges, Accomplishments, New Applications and Perspectives for Market Research,” Int. J.
Mark. Res., vol. 52, no. 4, pp. 123, 2010.
[147] M. R. Jacobson, C. E. Whyte, and T. Azzam, “Using Crowdsourcing to Code Open-Ended Responses: A
Mixed Methods Approach,” Am. J. Eval., vol. 39, no. 3, pp. 413429, 2018.
[148] K. Benoit, D. Conway, B. E. Lauderdale, M. Laver, and S. Mikhaylov, “Crowd-sourced Text Analysis:
Reproducible and Agile Production of Political Data,” Am. Polit. Sci. Rev., vol. 110, no. 2, pp. 278295,
2016.
[149] H. F. Hsieh and S. E. Shannon, “Three Approaches to Qualitative Content Analysis,” Qual. Health Res.,
vol. 15, no. 9, pp. 12771288, 2005.
[150] C. P. Health, “Content Analysis,” 2019. [Online]. Available:
https://www.publichealth.columbia.edu/research/population-health-methods/content-analysis.
[151] D. Kawashima and K. Kawano, “Meanings of Loss Among Japanese Suicide Bereaved: Content Analysis
of Open-Ended Responses,” Jpn. Psychol. Res., no. July 2017, pp. 19, 2020.
[152] F. C. Mamali, C. M. Lehane, W. Wittich, N. Martiniello, and J. Dammeyer, “What Couples Say About
Living and Coping With Sensory Loss: A Qualitative Analysis of Open-ended Survey Responses,”
Disabil. Rehabil., vol. 0, no. 0, pp. 122, 2020.
[153] G. Kelly, M. McKnight, and D. Schubotz, “Is there Anything Else You’d Like to Say About Community
P a g e | 104
Relations?’ Thematic Time Series Analysis of Open-ended Questions From an Annual Survey of 16- Year
Olds,” Methods, Data, Anal., vol. 14, no. 1, pp. 91126, 2020.
[154] M. A. Burg et al., “Current Unmet needs of Cancer Survivors: Analysis of Open-ended Responses to the
American Cancer Society Study of Cancer Survivors II,” Cancer, vol. 121, no. 4, pp. 623630, 2015.
[155] M. Savic, R. P. Ogeil, M. J. Sechtig, P. Lee-Tobin, N. Ferguson, and D. I. Lubman, “How Do Nurses
Cope with Shift Work? A Qualitative Analysis of Open-ended Responses from a Survey of Nurses,” Int.
J. Environ. Res. Public Health, vol. 16, no. 20, 2019.
[156] K. W. Mossholder, R. P. Settoon, S. G. Harris, and A. A. Armenakis, “Measuring Emotion in Open-ended
Survey Responses: An Application of Textual Data Analysis,” J. Manage., vol. 21, no. 2, pp. 335355,
1995.
[157] F. ten Kleij and P. A. D. Musters, “Text Analysis of Open-ended Survey Responses: A Complementary
Method to Preference Mapping,” Food Qual. Prefer., vol. 14, no. 1, pp. 4352, 2003.
[158] M. E. Roberts et al., “Structural Topic Models for Open-ended Survey Responses,” Am. J. Pol. Sci., vol.
58, no. 4, pp. 10641082, 2014.
[159] J. D. Lee and K. Kolodge, “Exploring Trust in Self-Driving Vehicles Through Text Analysis,” Hum.
Factors, vol. 62, no. 2, pp. 260277, 2020.
[160] T. Hynninen, A. Knutas, and M. Hujala, “Sentiment Analysis of Open-ended Student Feedback,” in 2020
43rd International Convention on Information, Communication and Electronic Technology (MIPRO),
2020, pp. 755759.
[161] L. Moták et al., “Antecedent Variables of Intentions to Use an Autonomous shuttle: Moving Beyond TAM
and TPB,” Eur. Rev. Appl. Psychol., vol. 67, no. 5, pp. 269278, 2017.
[162] S. Koul and A. Eydgahi, “The Impact of Social Influence, Technophobia, and Perceived Safety on
Autonomous Vehicle Technology Adoption,” Period. Polytech. Transp. Eng., pp. 110, 2019.
[163] H.-K. Chen and D.-W. Yan, “Interrelationships Between Influential Factors and Behavioral Intention with
Regard to Autonomous Vehicles,” Int. J. Sustain. Transp., vol. 13, no. 7, pp. 511527, 2019.
[164] P. Jing, H. Huang, B. Ran, F. Zhan, and Y. Shi, “Exploring the Factors Affecting Mode Choice Intention
of Autonomous Vehicle Based on an Extended Theory of Planned Behavior-A Case Study in China,”
Sustainability, vol. 11, no. 4, pp. 120, 2019.
[165] M. M. Rahman, M. F. Lesch, W. J. Horrey, and L. Strawderman, “Assessing the Utility of TAM, TPB,
and UTAUT for Advanced Driver Assistance Systems,” Accid. Anal. Prev., vol. 108, pp. 361373, 2017.
[166] Z. Xu, K. Zhang, H. Min, Z. Wang, X. Zhao, and P. Liu, “What Drives People to Accept Automated
Vehicles? Findings from a Field Experiment,” Transp. Res. Part C Emerg. Technol., vol. 95, pp. 320
334, 2018.
[167] J. K. Choi and Y. G. Ji, “Investigating the Importance of Trust on Adopting an Autonomous Vehicle,”
Int. J. Hum. Comput. Interact., vol. 31, no. 10, pp. 692702, 2015.
[168] T. Zhang, D. Tao, X. Qu, X. Zhang, R. Lin, and W. Zhang, “The Roles of Initial Trust and Perceived Risk
in Public’s Acceptance of Automated Vehicles,” Transp. Res. Part C Emerg. Technol., vol. 98, no. June
2018, pp. 207220, 2019.
[169] I. Panagiotopoulos and G. Dimitrakopoulos, “An Empirical Investigation on Consumers’ Intentions
Towards Autonomous Driving,” Transp. Res. Part C Emerg. Technol., vol. 95, pp. 773784, 2018.
[170] J. Wu, H. Liao, J.-W. Wang, and T. Chen, “The Role of Environmental Concern in the Public Acceptance
of Autonomous Electric Vehicles: A Survey from China,” Transp. Res. Part F Traffic Psychol. Behav.,
vol. 60, pp. 3746, 2019.
[171] P. Böhm, M. Kocur, M. Firat, and D. Isemann, “Which Factors Influence Attitudes Towards Using
Autonomous Vehicles?,” in Adjunct Proceedings of the 9th International ACM Conference on Automotive
User Interfaces and Interactive Vehicular Applications (AutomotiveUI ’17), 2017, pp. 141145.
[172] T. Leicht, A. Chtourou, and K. Ben Youssef, “Consumer Innovativeness and Intentioned Autonomous
Car Adoption,” J. High Technol. Manag. Res., vol. 29, no. 1, pp. 111, 2018.
[173] R. Madigan, T. Louw, M. Wilbrink, A. Schieben, and N. Merat, “What Influences the Decision to Use
Automated Public Transport? Using UTAUT to Understand Public Acceptance of Automated Road
P a g e | 105
Transport Systems,” Transp. Res. Part F Traffic Psychol. Behav., vol. 50, pp. 5564, 2017.
[174] H. Woltman, A. Feldstain, J. C. MacKay, and M. Rocchi, “An Introduction to Hierarchical Linear
Modelling,” Tutor. Quant. Methods Psychol., vol. 8, no. 1, pp. 5269, 2012.
[175] J. B. Ullman and P. M. Bentler, “Structural Equation Modeling,” in Handbook of Psychology, 2nd ed., I.
B. Weiner, Ed. Wiley Online Library, 2012, pp. 419443.
[176] K. E. Train, “Mixed Logit,” in Discrete Choice Methods with Simulation, 2nd ed., New York: Cambridge
University Press, 2009, pp. 134150.
[177] P. S. Lavieri, V. M. Garikapati, C. R. Bhat, and R. M. Pendyala, “An Investigation of Heterogeneity in
Vehicle Ownership and Usage for the Millennial Generation,” Transp. Res. Rec. J. Transp. Res. Board,
vol. 2664, pp. 9199, 2017.
[178] P. Liu, Q. Guo, F. Ren, L. Wang, and Z. Xu, “Willingness to Pay for Self-Driving Vehicles: Influences
of Demographic and Psychological Factors,” Transp. Res. Part C Emerg. Technol., vol. 100, pp. 306
317, 2019.
[179] P. S. Lavieri, V. M. Garikapati, C. R. Bhat, R. M. Pendyala, S. Astroza, and F. F. Dias, “Modeling
Individual Preferences for Ownership and Sharing of Autonomous Vehicle Technologies,” Transp. Res.
Rec. J. Transp. Res. Board, vol. 2665, pp. 110, 2017.
[180] J. Piao, M. McDonald, N. Hounsell, M. Graindorge, T. Graindorge, and N. Malhene, “Public Views
towards Implementation of Automated Vehicles in Urban Areas,” Transp. Res. Procedia, vol. 14, no. 0,
pp. 21682177, 2016.
[181] K. Kaur and G. Rampersad, “Trust in Driverless Cars: Investigating Key Factors Influencing the Adoption
of Driverless Cars,” J. Eng. Technol. Manag., vol. 48, pp. 8796, 2018.
[182] R. A. Daziano, M. Sarrias, and B. Leard, “Are Consumers Willing to Pay to Let Cars Drive for Them?
Analyzing Response to Autonomous Vehicles,” Transp. Res. Part C Emerg. Technol., vol. 78, pp. 150
164, 2017.
[183] W. Zhang, T. Yoshida, and X. Tang, “A Comparative Study of TF*IDF, LSI and Multi-words for Text
Classification,” Expert Syst. Appl., vol. 38, no. 3, pp. 27582765, 2011.
[184] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp.
9931022, 2003.
[185] R. Prabowo and M. Thelwall, “Sentiment Analysis: A Combined Approach,” J. Informetr., vol. 3, no. 2,
pp. 143157, 2009.
[186] Y. Liu, Z. Liu, T. Chua, and M. Sun, “Topical Word Embeddings,” in Proceedings of the 29th AAAI
Conference on Artificial Intelligence (AAAI’15), 2015, pp. 24182424.
[187] A. Lavelli, F. Sebastiani, and R. Zanoli, “Distributional Term Representations: An Experimental
Comparison,” in 13th ACM International Conference on Information and Knowledge Management, 2004,
pp. 615624.
[188] J. D. Mcauliffe and D. M. Blei, “Supervised Topic Models,” in Advances in Neural Information
Processing Systems, 2008, pp. 18.
[189] D. T. Vo and C. Y. Ock, “Learning to Classify Short Text from Scientific Documents using Topic Models
with Various Types of Knowledge,” Expert Syst. Appl., vol. 42, no. 3, pp. 16841698, 2015.
[190] U. Verma, “Text Preprocessing for NLP (Natural Language Processing), Beginners to Master,” Analytics
Vidhya, 2020. [Online]. Available: https://medium.com/analytics-vidhya/text-preprocessing-for-nlp-
natural-language-processing-beginners-to-master-fd82dfecf95.
[191] J. Weng, “NLP Text Preprocessing: A Practical Guide and Template,” Towards Data Science, 2019.
[Online]. Available: https://towardsdatascience.com/nlp-text-preprocessing-a-practical-guide-and-
template-d80874676e79.
[192] T. Singh and M. Kumari, “Role of Text Pre-Processing in Twitter Sentiment Analysis,” in Procedia -
Procedia Computer Science, 2016, vol. 89, pp. 549554.
[193] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,in Procedia
Computer Science, 2013, vol. 17, pp. 2632.
P a g e | 106
[194] I. Peled, F. Rodrigues, and F. C. Pereira, “Model-Based Machine Learning for Transportation,” in
Mobility Patterns, Big Data and Transport Analytics: Tools and Applications for Modeling, C. Antoniou,
L. Dimitriou, and F. C. Pereira, Eds. Elsevier, 2019, pp. 145171.
[195] B. Mabey, “pyLDAvis,” 2014. [Online]. Available: https://pyldavis.readthedocs.io/en/latest/index.html.
[196] Z. Zhao, H. N. Koutsopoulos, and J. Zhao, “Discovering Latent Activity Patterns from Transit Smart Card
Data: A Spatiotemporal Topic Model,” Transp. Res. Part C Emerg. Technol., vol. 116, no. July 2019, p.
102627, 2020.
[197] F. C. Pereira, F. Rodrigues, E. Polisciuc, and M. Ben-akiva, “Why So Many People? Explaining
Nonhabitual Transport Overcrowding With Internet Data,” IEEE Trans. Intell. Transp. Syst., vol. 16, no.
3, pp. 110, 2015.
[198] I. Markou, F. Rodrigues, and F. C. Pereira, “Is Travel Demand Actually Deep? An Application in Event
Areas Using Semantic Information,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 2, pp. 641652, 2020.
[199] T. Kurashima, T. Iwata, G. Irie, and K. Fujimura, “Travel Route Recommendation using Geotagged
Photos,” Knowl. Inf. Syst., vol. 37, no. 1, pp. 3760, 2013.
[200] Z. Xu, L. Chen, and G. Chen, “Topic-based Context-aware Travel Recommendation Method Exploiting
Geotagged Photos,” Neurocomputing, vol. 155, pp. 99107, 2015.
[201] S. Hasan and S. V. Ukkusuri, “Urban Activity Pattern Classification Using Topic Models from Online
Geolocation Data,” Transp. Res. Part C Emerg. Technol., vol. 44, pp. 363381, 2014.
[202] A. S. Pietsch and S. Lessmann, “Topic Modeling for Analyzing Open-ended Survey Responses,” J. Bus.
Anal., vol. 1, no. 2, pp. 93116, 2018.
[203] E. Tvinnereim and K. Fløttum, “Explaining Topic Prevalence in Answers to Open-ended Survey
Questions about Climate Change,” Nat. Clim. Chang., vol. 5, no. 8, pp. 744747, 2015.
[204] S. Mitsui, T. Kubo, and Y. Shoji, “Understanding Residents’ Perceptions of Nature and Local Economic
Activities Using an Open-ended Question Before Protected Area Designation in Amami Islands, Japan,”
J. Nat. Conserv., vol. 56, no. May 2019, p. 125857, 2020.
[205] V. Baburajan, J. de Abreu e Silva, and F. C. Pereira, “Open-Ended Versus Closed-Ended Responses: A
Comparison Study Using Topic Modeling and Factor Analysis,” IEEE Trans. Intell. Transp. Syst., pp. 1
10, 2020.
[206] Qualtrics, “Predicted Duration,” Qualtrics, 2020. [Online]. Available:
https://www.qualtrics.com/support/survey-platform/survey-module/survey-checker/survey-
methodology-compliance-best-practices/#PredictedDuration. [Accessed: 16-Oct-2020].
[207] H. B. Mann and D. R. Whitney, “On a Test of Whether One of Two Random Variables is Stochastically
Larger than the Other,” Ann. Math. Stat., vol. 18, no. 1, pp. 5060, 1947.
[208] T. Stoiber, I. Schubert, R. Hoerler, and P. Burger, “Will Consumers Prefer Shared and Pooled-use
Autonomous Vehicles? A Stated Choice Experiment with Swiss Households,” Transp. Res. Part D
Transp. Environ., vol. 71, no. December 2018, pp. 265282, 2019.
[209] V. Baburajan, J. de Abreu e Silva, and F. C. Pereira, “Opening Up the Conversation: Topic Modeling for
Automated Text Analysis in Travel Surveys,” in 2018 21st International Conference on Intelligent
Transportation Systems (ITSC), 2018, pp. 36573661.
[210] W. H. Greene and D. A. Hensher, Modelling Ordered Choices: A Primer, 1st ed. New York: Cambridge
University Press, 2010.
[211] J. de Abreu e Silva, C. Papaix, and G. Chen, “The Influence of Information-based Transport Demand
Management Measures on Commuting Mode Choice. Comparing Web vs. Face-to-face Surveys,” Transp.
Res. Procedia, vol. 32, pp. 363373, 2018.
[212] B. Muthén and T. Asparouhov, “Bayesian Structural Equation Modeling: A More Flexible Representation
of Substantive Theory,” Psychol. Methods, vol. 17, no. 3, pp. 313335, 2012.
[213] A. E. Gelfand, “Gibbs Sampling,” J. Am. Stat. Assoc., vol. 95, no. 452, pp. 13001304, 2000.
[214] P. Henman, “Improving Public Services Using Artificial Intelligence: Possibilities, Pitfalls, Governance,”
Asia Pacific J. Public Adm., vol. 42, no. 4, pp. 209221, 2020.
P a g e | 107
[215] F. D. Davis, R. P. Bagozzi, and P. R. Warshaw, “User Acceptance of Computer Technology: A
Comparison of Two Theoretical Models,” Manage. Sci., vol. 35, no. 8, pp. 9821003, 1989.
[216] J. Zmud, I. N. Sener, and J. Wagner, “Consumer Acceptance and Travel Behavior Impacts of Automated
Vehicles,” 2016.
P a g e | 108
P a g e | I
APPENDIX A
6 QUESTIONNAIRE: INTENTION TO USE SHARED AVS (INDIA)
This is a collaborative research program undertake jointly by researchers from the Technical University of
Denmark and Instituto Superior Técnico, University of Lisbon, Portugal to understand the attitudes towards the
use of Autonomous Mobility Services. It is financially supported by European Cooperation in Science &
Technology Cost Action- TU1305.
This questionnaire is completely anonymous, and the data collected here will be used exclusively for research
purposes and will not be given to third parties, other than the researchers involved in this project.
We appreciate and thank you for your participation in this research. Your participation is voluntary, but your
answers are very valuable to us. Please keep in mind that there are no right and wrong answers.
The research team:
Vishnu Baburajan
PhD Student- Instituto Superior Técnico, University of Lisbon, Lisboa, Portugal
Visiting PhD Student- Technical University of Denmark, Copenhagen, Denmark
Prof. João de Abreu e Silva, PhD
Instituto Superior Técnico, University of Lisbon, Lisboa, Portugal
Prof. Francisco Camara Pereira, PhD
Technical University of Denmark, Copenhagen, Denmark
Three randomly selected respondents completing the survey will be given coupons for “Middag for 2”. If you
want to participate in this lottery please provide us with your email address, to deliver the coupons to you.
P a g e | II
1. I am a person who likes to
Strongly
Disagree
Disagree
Neutral
Agree
Strongly
Agree
To have the latest gadgets
Test new mobile applications
Use phone frequently for online reservations,
payments, etc.
Follow news about Autonomous Vehicles (Google
Car, Tesla, etc.)
2. I use my Smartphone for travel-related needs
Never
Rarely
Sometimes
Often
Frequently
I do not
have a
Smartphone
Consulting maps, to know my location
and get information about routes and
transport modes
Public transportation/ taxi apps
Bike-sharing systems
3. I think Transport is one of the major causes of environmental problems as it …
Strongly
Disagree
Disagree
Neutral
Agree
Strongly
Agree
Is a major source of pollution
Depends too much on fossil fuels
Requires major infrastructure (roads, rail tracks,
tunnels)
4. I think it is possible to reduce environmental footprint due to transportation, by increasing the
use of
Extremely
Unlikely
Unlikely
Neutral
Likely
Highly Likely
Carpool
Public Transport
Bike
5. I think, environmental problems due to transportation can be solved by technological
advancements in
Extremely
Unlikely
Unlikely
Neutral
Likely
Highly Likely
Alternative fuel (biodiesel,
natural gas, hydrogen)
Electric Vehicles
Electric Bikes
Bikes and Car sharing
6. For travel related needs, I use my smartphone for (Mention at least 2 important uses that you find
relevant)
7. I think Transport is a major cause of the environmental problem because... (Mention at least 2 major
causes that you find relevant)
P a g e | III
8. Autonomous vehicles will impact society by
Extremely
Unlikely
Unlikely
Neutral
Likely
Extremely
Likely
Making travel more environmentally friendly
Reducing traffic congestion in cities
Reducing transportation induced pollution
Making travel safer, by reducing accidents
Making travel easier to people who cannot
otherwise drive
Reducing gender equity issues in travel
Reducing the need for parking spaces
Causing unemployment of existing drivers
Creating new jobs for skilled workers
9. The society will benefit from the use of Autonomous Vehicles as it will... (Mention at least 2 benefits
that you find relevant)
10. Autonomous vehicles are likely to impact the society negatively as... (Mention at least 2 negative
impacts that you find relevant)
P a g e | IV
Strongly
Disagree
Disagree
Neutral
Agree
Strongly
Agree
11. These questions are related to your use of Autonomous Vehicles (AVs)
I think it will be cool to use AVs
I can involve in other activities during travel
(interact with friends, read books, browse the
internet)
I will be relieved from the stress of driving
I can eliminate parking-related issues (charges,
cruising time, etc.)
I find it undesirable to share the vehicles with
others
I may have to plan my travel, as I may not have
access to vehicles
I think AVs might kill the pleasure of driving
12. I think my friends and family
Will use AVs
Believe, AVs will reduce congestion
Believe, AVs will reduce pollution
Believe, AVs will make travel safer, by reducing
accidents
Encourage the use of public transport
Will be positive about me using AVs
13. Perceptions about the use of AVs,
I may take time learning to use it
I am confident the system will be protected
against hacking and failures
I am confident that the interaction with other
vehicles will be safe
I am worried about the liabilities after an
accident
I have concerns regarding the payment for the
services
I believe, this will make my travel more efficient
(saves time and cost)
I believe, AVs will make travel more
environmentally friendly
I think AVs will not be affordable to me
P a g e | V
Autonomous Vehicles (AV) are often described as a sustainable solution to transportation. AVs are believed to
address issues such as congestion, pollution and address the issue of increasing vehicle ownership.
These driverless buses will be operating along the same bus routes and are expected to improve the efficiency
of public transportation systems.
Individual Characteristics
14. I will use Autonomous Mobility Services for my daily travel needs
Highly
Unlikely
Unlikely
Neutral
Likely
Highly Likely
15. In the last week, I have commuted by (tick all the modes you have used)
Walk only
Bike
Public Transport
Car
Motorbike
Intermediate Public Transport (taxis,
autorickshaw, Uber, etc.)
16. How many minutes do you take to travel between home and university/school/workplace?
17. I have shared rides with others in the past
Daily
2-3 times a week
2-3 times a month
Rarely
Never
18. What is your gender?
Female
Male
Prefer not to answer
19. What is your age?
20. What is your occupation?
Student
Postgraduate
student
Faculty
Manager
Professional
A technician or
associate
professional
Clerical support
worker
Service and sales
worker
Skilled
agricultural,
forestry and
fishery worker
Craft and related
trades worker
Plant and machine
operator and
assembler
Elementary
occupation
Armed forces
occupation
Without an
occupation (e.g.
retired,
unemployed)
Prefer not to
answer
21. What is your household income in INR per month?
0-9,999
10,000- 24,999
25,000-49,999
50,000-74,999
75,000-89,999
More than 90,000
I prefer not to
answer
22. In the last 2 years, have you been involved in a road accident?
No
Yes
23. If you are available to be contacted in the future to answer surveys related to scientific
research on transportation, please give us your email id.
P a g e | VI
P a g e | VII
APPENDIX B
7 QUESTIONNAIRE: INTENTION TO USE AVS FOR
COMMUTE TRIPS (USA)
To understand the attitudes towards the use of Autonomous Vehicles, this collaborative research is undertaken
by researchers from
Instituto Superior Técnico, Lisbon, Portugal
Technical University of Denmark, Copenhagen, Denmark
This questionnaire is completely anonymous and the data collected here will be used exclusively for research
purposes. It will not be shared with third parties, other than with the researchers involved in this project.
We appreciate and thank you for your participation in this research. Your participation is voluntary, but your
answers are very valuable to us. Please keep in mind that there are no right and wrong answers.
The research team:
Vishnu Baburajan
PhD student- Instituto Superior Técnico, Lisbon, Portugal
Guest PhD student- Technical University of Denmark, Copenhagen, Denmark
Prof. João de Abreu e Silva, PhD
Associate Professor, Instituto Superior Técnico, Lisbon, Portugal
Prof. Francisco Camara Pereira, PhD
Professor, Technical University of Denmark, Copenhagen, Denmark
P a g e | VIII
A bit about yourself (Socio-economic characteristics)
1. Gender
Female
Male
Prefer not to answer
2. My age is…
3. Total annual household income before taxes
Less than $10,000
$10,000 to $14,999
$15,000 to $24,999
$25,000 to $34,999
$35,000 to $49,999
$50,000 to $74,999
$75,000 to $99,999
$100,000 to $124,999
$125,000 to $149,999
$150,000 to $199,999
$200,000 or more
I don’t know
I prefer not to answer
4. Educational Qualification
Less than high school
graduate
High school graduate or
GED
Some college or
associate degree
Bachelors degree
Graduate degree or
professional degree
(Masters or PhD)
I don’t know
I prefer not to answer
5. Race
White
Black or African
American
Asian
American Indian or
Alaska Native
Native Hawaiian or
other Pacific Islander
Some other race
I don’t know
I prefer not to answer
6. Employment status
Full-time
Part-time
Student
7. Including yourself, how many adults (18 years and older) live in your household?
1 (you)
2
3
4
5+
8. How many children aged between 8 and 17 live in your household?
0
1
2
3
4
5
6+
9. How many children under age 7 live in your household?
0
1
2
3
4
5+
10. In which state do you currently reside?
P a g e | IX
Travel Characteristics
12. Average miles travelled per year (all modes)
Less than 3,000 miles
3,000-6,000 miles
6,000-9,000 miles
9,000-13,000 miles
13,000-15,000 miles
15,000-18,500 miles
Over 18,5000 miles
I don’t know or I
prefer not to answer
13. How often do you take each of these modes of transportation?
Every day
Several times
a week
Several times
a month
Several times
a year
Never
Walk
Bicycle
Motorcycle/Scooter
Car/SUV/Van
Car Sharing
(Zipcar/Car2Go,
etc.)
Taxi/Ride-hailing
(Uber/Lyft, etc.)
Public Transport
(Bus/Train/Subway/
Ferry/Light Rail)
I work from home
(home-based
telecommute)
14. How long is your distance (one-way) to work/school (in miles)? Please round to the nearest number
Distance
I do not know
15. How many minutes is your average morning commute? Please round to the nearest number
Travel time
I work from home (home-based telecommute)
16. How often do you make stops or run errands on your way to work or home or in the middle of the
day?
Daily
3-4 times a week
1-2 times a week
Less than once a week
Never
The following question is shown only to respondents using the car every day or several times a week
17. How much do you spend per day (in $) in parking while you are at work/school? (Please calculate
the cost per day, even if you have a monthly or yearly membership. Also, round to the nearest number)
Parking charges
I have access to a free parking facility
The following question is shown only to respondents not using the car every day or several times a week
18. What is the average parking charge per day (in$) near your work/school? Please round to the nearest
number
Parking Cost
I do not know
The following questions are shown only to respondents using the car every day or several times a week
19. On a typical day in the last week, how many individuals are in the car on your commute to
work/school? (including yourself)
1 (you)
2
3
4+
P a g e | X
20. How important to you is the ability to leave items in your car?
Very important
Somewhat important
Not important
Shown to everybody
21. The average cost of a new car today is 35,250 USD. Assuming that it is now time to purchase a new
car, how much are you willing to spend on your next vehicle purchase? Please indicate a value- only
numbers
Vehicle Cost
I do not have plans to buy a car
P a g e | XI
Familiarity with Autonomous Vehicles
22. Some modern cars are equipped with Adaptive Cruise Control (ACC)- a system that can
automatically follow another car.
How often did you use Adaptive Cruise Control when driving in the last 12 months?
Very frequently
Frequently
Occasionally
Rarely
Very rarely
Never
I don’t know about
ACC
I don’t drive a car with
ACC
23. Have you heard of Autonomous Vehicle (Google, Tesla, etc.)?
Yes
No
24. Ever ridden in a Fully Autonomous Vehicle?
Yes
No
Shown only to respondents answering Ver_LKOE
25. What are your general opinions about Autonomous Vehicles?
26. Do you believe that Autonomous Vehicles are useful?
Yes
No
Explain why.
P a g e | XII
Perceptions of Autonomous Vehicles (Shown only to Ver_LK and Ver_LKOE)
You will now be presented with some statements about Autonomous Vehicles. Please indicate how
much you agree with each of these statements.
Strongly
Disagree
Disagree
Neutral
Agree
Strongly
Agree
27. Learning to use Autonomous Vehicles will be
easy for me
28. I will find it easy to get Autonomous Vehicles to
do what I want them to do
29. It will be easy for me to become skilful at using
Autonomous Vehicles
30. I will find Autonomous Vehicles easy to use
31. Using Autonomous Vehicles will be useful in
meeting my travel needs
32. Autonomous Vehicles will let me do other tasks
such as eating, watching a movie, be on a cell phone
during my trip
33. Using Autonomous Vehicles will decrease my
accident risk
34. Using Autonomous Vehicles will relieve my
stress of driving
35. I find Autonomous Vehicles to be useful when
I’m impaired (e.g. sleepy, under the influence of
alcohol or a controlled substance)
36. I’m worried about the general safety of such
technology
37. I’m worried that the failure or malfunction of
Autonomous Vehicles may cause accidents
38. I’m concerned that Autonomous Vehicles will
collect too much personal information from me
39. I’m concerned that Autonomous Vehicles will
use my personal information for other purposes
without my authorisation
40. I’m concerned that Autonomous Vehicles will
share my personal information with other entities
without my authorisation
41. Autonomous Vehicles are dependable
42. Autonomous Vehicles are reliable
43. Overall, I can trust Autonomous Vehicles
44. Using Autonomous Vehicles is a good idea
45. Using Autonomous Vehicles is a wise idea
46. Using Autonomous Vehicles is pleasant
P a g e | XIII
Shown only to respondents answering Ver_OE of the Questionnaire
47. Do you think that it will be easy to use Autonomous Vehicles?
Yes
No
Explain why.
48. Do you believe that Autonomous Vehicles are useful?
Yes
No
Explain why.
49. Do you have safety concerns regarding the use of Autonomous Vehicles?
Yes
No
Explain why.
50. Do you have concerns related to privacy associated with the use of Autonomous Vehicles?
Yes
No
Explain why
51. Would you as a user trust an Autonomous Vehicle?
Yes
No
Explain why
51. What are your general opinions about Autonomous Vehicles?
P a g e | XIV
SP Introduction
In the next section, you will be presented with 3 car based alternatives for your commute trips.
Regular car- This will belong to your household.
Private Autonomous Vehicle- This option will be similar to a regular car but will have the ability to
self-drive. This will belong to your household.
Shared Autonomous Vehicle- This option involves a subscription to a fleet of shared autonomous
vehicles, to which you may have access on-demand. This car will pick you up and drop you off at your
destination without requiring you to look for parking.
This study aims to understand the shift between a regular car and autonomous vehicles. Please choose the
alternative of your preference based on the attributes presented to you. Even if you happen to be a user of non-
motorized modes of transport (bike, walk, etc.) or public transport (bus, train, light rail, ferry, etc.), we ask you
to consider these alternatives and choose the one you prefer.
You will be presented with six different scenarios. For each scenario, please pick the option you would choose.
Given the following characteristics
Regular Car
Private Autonomous
Vehicle
Shared Autonomous
Vehicle
Purchase Cost ($)
Yearly Membership Cost ($)
Trip cost ( per the direction of
commute) $
Daily parking cost ($)
Keep in mind that the time you spend in each vehicle would be the same. Your current vehicle requires
additional time to look for parking and time to walk to the parking, which the autonomous vehicle no longer
necessitates.
Which option would you choose to use for this commute trip?
Regular Car
Private Autonomous
Vehicle
Shared Autonomous
Vehicle
P a g e | XV
APPENDIX C
8 EXPERIMENTAL DESIGN FOR THE INTENTION TO USE
AVS FOR COMMUTE TRIPS
Table 8.1 Statements to Measure the Constructs in the Proposed Model and Their Sources
Constructs
Items
Contents
Sources
Perceived ease
of use (PEoU)
PEOU1
Learning to use autonomous vehicles will be easy for me
Davis, Bagozzi and
Warshaw [215]
PEOU2
It will find it easy to get autonomous vehicles to do what I
want it to do
PEOU3
It will be easy for me to become skilful at using
autonomous vehicles
PEOU4
I will find autonomous vehicles easy to use
Perceived
usefulness
(PU)
PU1
Using autonomous vehicles will be useful in meeting my
driving needs
Davis, Bagozzi and
Warshaw [215]
PU2
Autonomous vehicles will let me do other tasks, such as
eating, watch a movie, be on a cell phone on my trip
PU3
Using autonomous vehicles will decrease my accident risk
PU4
Using autonomous vehicles will relieve my stress of
driving
PU5
I find autonomous vehicles to be useful when I’m impaired
(e.g. drowsy, drunk, drugs)
Perceived
safety risk
(PSR)
PSR1
I’m worried about the general safety of such technology
Zmud, Sener and
Wagner [216]
PSR2
I’m worried that the failure or malfunctions of autonomous
vehicles may cause accidents
Perceived
privacy risk
(PPR)
PPR1
I am concerned that autonomous vehicles will collect too
much personal information from me
Kyriakidis, Happee
and de Winter [39]
PPR2
I am concerned that autonomous vehicles will use my
personal information for other purposes without my
authorisation
PPR3
I am concerned that autonomous vehicles will share my
personal information with other entities without my
authorisation
Trust
Trust1
Autonomous vehicles are dependable
Choi and Ji [167]
Trust2
Autonomous vehicles are reliable
Trust3
Overall, I can trust autonomous vehicles
Attitude
(ATT)
ATT1
Using autonomous vehicles is a good idea
Davis, Bagozzi and
Warshaw [215]
ATT2
Using autonomous vehicles is a wise idea
ATT3
Using autonomous vehicles is pleasant
P a g e | XVI
Table 8.2 Orthogonal Scenarios (Source: Haboucha, Ishaq and Shiftan[40])
Scenar
io
Variable 1
(Purchase price
PAV) [in %]
Variable 2
(Subscription
cost SAV) [in $]
Variable 3
(Trip Cost
PAV) [in %]
Variable 4
(Trip Cost
SAV) [in %]
Variable 5
(Parking Cost
PAV) [in %]
1
100
$2000
85
0
0
2
100
$0
120
300
100
3
100
$150
70
210
30
4
100
$300
100
150
60
5
115
$2000
120
210
60
6
115
$0
85
150
30
7
115
$150
100
0
100
8
115
$300
70
300
0
9
80
$2000
70
150
100
10
80
$0
100
210
0
11
80
$150
85
300
60
12
80
$300
120
0
30
13
130
$2000
100
300
30
14
130
$0
70
0
60
15
130
$150
120
150
0
16
130
$150
85
210
100
17#
130
$150
85
210
30
18#
100
$300
70
210
0
19#
80
$2000
100
150
0
* yearly cost of membership # additional scenarios generated
P a g e | XVII
APPENDIX D
9 RESULTS OF TOPIC MODEL (USA)
In this research, six open-ended questions were presented to respondents answering Ver_OE
of the questionnaire. Along with the first five open-ended questions, we presented respondents
with an option to agree/disagree with the statements. In the estimation of sLDA, we used
responses to the agree/disagree statement as the response variable (results in Table 9.1).
We extracted four topics from OE1; the first extracted topic (To_S11) was primarily about
the easiness of getting it to work, learn and gain trust. The second topic (To_S12) discussed
the lack of control that makes it unsafe and difficult to trust. Finally, To_S13 covers the
easiness in operation, and To_S14 covers additional benefits from the self-navigation and the
easiness.
We extracted six broad themes sLDA from the responses to the perceived usefulness of AVs.
First, respondents believed that AVs might save travel time and make travel more
environmentally friendly (To_S21). Second, on the ability to work during travel, participants
shared contrasting views. Respondents believed that AVs might facilitate working during
travel (To_S22); it may, however, demand additional attention, which may negatively affect
their work (To_S26). Third, AVs might make travelling safer, mitigate congestion (To_S23),
and ensure mobility for the disabled (To_S24). Finally, many participants emphasised the
need for human control while using AVs (To_S25).
The next open-ended question evaluated the safety concerns associated with the use of AVs.
The safety concerns stemming from the lack of control is a significant concern (To_S31).
Many argue that lack of control causes accidents (To_S33) or due to malfunctions (To_S34)
or sensor fails (To_S35). Furthermore, as humans are error-prone, many believe that there
could be flaws in the software programs (To_S36) and emphasise the need for thorough
testing of AVs before their widespread deployment (To_S32).
We then evaluated if individuals had privacy concerns related to the use of AVs. Many shared
no privacy concerns as they opined that it was unnecessary if they are transparent (To_S41).
Another argument was that the information is already in the public domain (To_S44) through
various platforms. Some opined that they do have concerns, but it was not something that they
should be bothered about (To_S43). Furthermore, it would not be a concern if users are
P a g e | XVIII
informed about data collection and storage (To_S46) and handling the information securely
(To_S46). Regarding some of the concerns, they were mostly related to hacking (To_S42).
Table 9.1 Top 5 Words for Each Topic for Open-ended Questions
Word_1
Word_2
Word_3
Word_4
Word_5
OE1- Do you think that it will be easy to use Autonomous Vehicles
To_S11
use
easi
technolog
make
learn
To_S12
control
time
feel
hard
trust
To_S13
drive
driver
get
easier
need
To_S14
everyth
assum
go
comput
destin
OE2- Do you believe that Autonomous Vehicles are useful?
To_S21
use
time
environ
better
save
To_S22
drive
driver
get
work
commut
To_S23
accid
driver
traffic
reduc
help
To_S24
drive
use
help
need
abl
To_S25
accid
use
control
human
issu
To_S26
make
abl
attent
go
pay
OE3- Do you have safety concerns regarding the use of Autonomous Vehicles?
To_S31
drive
concern
safeti
control
driver
To_S32
technolog
safe
use
work
trust
To_S33
comput
happen
malfunct
system
alway
To_S34
accid
caus
malfunct
worri
road
To_S35
stop
road
abl
need
sensor
To_S36
human
driver
drive
error
make
OE4- Do you have concerns related to privacy associated with the use of Autonomous Vehicles?
To_S41
make
question
noth
abl
safe
To_S42
hack
someon
technolog
system
hacker
To_S43
privaci2
concern
issu
sure
use
To_S44
alreadi
track
use
differ
everyth
To_S45
inform
need
info
go
secur
To_S46
inform
data
person
compani
collect
OE5- Would you as a user trust an Autonomous Vehicle?
To_S51
trust
make
abl
safe
sure
To_S52
trust
technolog
use
time
ye
To_S53
safeti
comput
trust
concern
malfunct
To_S54
drive
driver
human
better
thing
To_S55
control
drive
feel
technolog
enough
In the fifth open-ended question (OE5), we asked respondents if they would trust AVs. A
significant proportion of respondents were not yet ready to trust AVs. These trust issues could
be related to further testing (To_S51) and the potential safety concerns due to malfunctions
2
privacy and concern can be combined; they however did not appear in the same sequence in
a sentence and hence was not combined
P a g e | XIX
(To_S53). Sceptics argued that humans could drive better (To_S54) and that computer cannot
be trusted (To_S55). Probably over time, more users might start trusting the system (To_S52).
P a g e | XX
P a g e | XXI
APPENDIX E
10 ESTIMATION RESULTS FOR INTENTION TO USE AVS FOR
COMMUTE TRIPS
For the estimation, we consider “Regular Car” as the base alternative and present the
estimated coefficients for “Private AVs” (PAV) and “Shared AVs” (SAV). In the subsequent
paragraphs, we present a discussion on the effects different variables have on the choices.
Finally, in the following discussion, we provide the names of the models in square brackets
that align with the findings from this research.
Socio-demographic characteristics- In this study, we explored the influence of socio-
demographic characteristics on the choice of mode for commute trips (coefficients in Table
10.1). Male respondents answering the survey are more likely to use AVs [Prop, Ver_LKOE,
Ver_OE] as observed by Payre, Cestac and Delhomme [38] (opposite effect for shared AV for
Ver_LK). Younger individuals are more likely to use AVs [Prop, Ver_LKOE, Ver_OE]
(similar to Nielsen and Haustein [42], but the opposite effect for shared AV for Ver_LK).
Higher-income respondents are more likely to use private AVs [Prop, Ver_OE, Ver_LKOE,
Ver_OE] (similar observations made by Bansal and Kockelman [16]), and lower-income
respondents are more likely to use Shared AVs [Prop, Ver_LKOE] [38] (opposite effect for
Ver_LK). Individuals with higher educational qualifications are likely to use AVs [Prop,
Ver_LK, Ver_OE] (similar observations made by Haboucha, Ishaq and Shiftan [40], but the
opposite effect private AV for Ver_LKOE). White Americans are less likely to use Private
AVs [Prop, Ver_LKOE, Ver_OE] and African Americans are less likely to use Private and
Shared AVs [Prop, Ver_LKOE, Ver_OE] (opposite effect for Ver_LK). Compared to
employed individuals, students are more likely to use both variants of AVs [Prop, Ver_LKOE,
Ver_OE] (opposite effect for shared AV for Ver_LK). Individuals from families with more
adults are less likely to use AVs [Prop, Ver_LKOE] (opposite effect for Ver_LK), so does
individuals from families with kids of age less than 8 [Prop, Ver_LK, Ver_OE] (opposite
effect for Private AV for Ver_LKOE), but those from families with more kids aged between 8
and 17 are more likely-to-use AVs [Prop, Ver_LK, Ver_OE] (opposite effect for Ver_LKOE).
Table 10.1 Estimated Coefficients for Socio-Demographic Characteristics
Variables
Proposed
Ver_LK
Ver_LKOE
Ver_OE
PAV
SAV
PAV
SAV
PAV
SAV
PAV
SAV
Constant
0.183
-0.237
-0.829
-0.488
0.456
-0.522
0.495
-0.175
P a g e | XXII
Male
0.035
0.042
0.013
-0.293
0.037
0.031
0.084
0.023
Age
16 and 25
0.566
0.327
0.23
-0.326
0.441
0.51
0.556
0.304
26 and 35
0.463
0.179
0.547
-0.189
0.279
0.572
0.499
0.039
36 and 45
0.343
0.059
0.223
-0.215
0.379
0.543
0.313
-0.111
46 and 55
0.292
0.198
0.242
-0.114
0.559
0.983
0.201
-0.03
Household
Income
0 - $24,999
-0.047
0.228
-0.045
-0.102
-0.041
0.362
-0.065
0.524
$25,000 and $49,999
-0.013
0.098
-0.129
-0.295
0.261
0.699
-0.152
-0.161
$50,000 and $74,999
0.084
0.147
-0.032
-0.462
0.477
0.836
-0.158
-0.053
$75,000 and $99,999
-0.082
0.223
0.093
0.131
0.468
0.944
-0.69
-0.332
Less than high school graduate
-0.806
-0.865
-0.558
-0.545
0.325
-0.105
-1.126
-0.992
High school graduate or GED
-0.359
-0.513
-0.234
-0.384
0.115
0.091
-0.351
-0.721
Some college or associate degree
-0.298
-0.48
0.276
-0.436
-0.346
-0.508
-0.369
-0.243
Bachelors' degree
-0.151
-0.033
0.145
0.028
0.093
0.319
-0.365
-0.104
White
-0.129
0.01
0.273
0.169
-0.325
-0.275
-0.286
0.076
Black or African American
-0.096
-0.267
0.41
0.031
-0.472
-0.474
-0.242
-0.327
American Indian or Alaska Native
0.264
-0.801
0.181
-1.159
-0.148
-0.232
0.398
-0.906
Asian
-0.154
-0.026
-0.329
0.297
-0.339
-1.135
-0.091
0.033
Native Hawaiian or other Pacific Islander
-0.293
-0.755
0.195
0.385
0.004
-0.523
-0.6
-1.306
Full-time
-0.285
-0.215
-0.294
0.165
-0.181
-0.46
-0.088
-0.088
Part-time
-0.358
-0.292
-0.043
0.199
-0.352
-0.21
-0.457
-0.677
Number of adults
-0.122
-0.024
0.049
0.12
-0.198
-0.148
-0.17
0.014
Number of children aged between 8 and 17
0.022
0.092
0.055
0.099
-0.153
-0.097
0.101
0.178
Number of children aged less than 8
-0.11
-0.184
-0.165
-0.254
0.079
-0.005
-0.158
-0.28
Coefficients at 99% confidence level
Coefficients at 95% confidence level
Coefficients at 90% confidence level
Travel characteristics- The estimated coefficients are presented in Table 10.2. Those with
higher vehicle miles are more likely to choose AVs [Prop, Ver_LKOE, Ver_OE] (similar
results were observed by Shabanpour et al. [68], but the opposite effect for Ver_LKOE).
Besides, the current mode used for commute also influences the choice of mode for commute
trips. Those walking/biking are more likely to choose shared AVs [Prop, Ver_LK,
Ver_LKOE, Ver_OE], while those using motorbikes/mopeds are more likely to choose
private AVs [Prop, Ver_LKOE, Ver_OE] (opposite effect for Private AV for Ver_LK and
Shared AV for Ver_LKOE). Respondents using a car to commute are less likely to use AVs
and are more likely to stick to conventional cars- which probably is more habitual [Prop,
Ver_LK, Ver_LKOE] (opposite effect for Ver_OE). Interestingly, those using car-sharing
options are more likely-to-use Private AVs [Prop, Ver_LK, Ver_OE] (opposite effect for
Private AV for Ver_LKOE), while those using taxi/ride-hailing services are more likely-to-
use AVs (both shared and private) [Prop, Ver_LK, Ver_OE], while we observed opposite
effects for Ver_LKOE. Individuals not owning a car are less likely to use Private AVs and
Shared AVs [Prop, Ver_LKOE] (opposite effects for Shared AV for Ver_LK and Ver_OE).
Also, respondents travelling alone are less likely to use Shared AVs [Prop, Ver_LKOE]
P a g e | XXIII
(opposite effect for Private AV for Ver_LK), and those travelling with others are likely to use
AVs in general [Prop, Ver_LK, Ver_LKOE, Ver_OE]. Interestingly, even people who
consider leaving items in the car to be important are willing to use Shared AVs [Prop,
Ver_LK, Ver_LKOE, Ver_OE]. While exploring the influence of the familiarity with AV
systems, individuals familiar with AVs are optimistic about using AVs (opposite effects for
Shared AV for Ver_LKOE and Private AV for Ver_OE); however, individuals who have
ridden AVs in the past prefer Private AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE] (Zmud and
Sener [208] observed similar results).
Table 10.2 Estimated Coefficients for Travel, Familiarity with AVs and SP
Variables
Proposed
Ver_LK
Ver_LKOE
Ver_OE
PAV
SAV
PAV
SAV
PAV
SAV
PAV
SAV
Total miles travelled is less than 35 miles
0.016
-0.2
0.183
-0.405
-0.038
-0.359
-0.034
0.064
Total miles travelled is between 35 and 70
miles
-0.102
-0.22
0.014
-0.268
-0.058
-0.143
-0.201
-0.167
Mode used for commute
trip
Walk
-0.04
0.161
0.02
0.149
0.001
0.544
-0.023
0.015
Bike
0.036
0.697
-0.334
0.775
-0.024
0.191
0.09
0.876
Motorcycle/moped
0.101
-0.077
-0.794
-1.458
0.133
0.015
0.223
-0.698
Car/SUV/Van/Pickup
-0.271
-0.117
-0.264
-0.602
-0.416
-0.288
0.147
0.129
Car-sharing
0.181
-0.318
0.217
-0.036
-0.331
-0.778
0.494
-0.045
Taxi/ride-hailing
0.145
0.185
0.391
0.07
-0.296
-0.035
0.106
0.447
Public transport
0.189
0.27
0.053
0.454
-0.251
-0.088
0.37
0.47
Does not own a car
-0.275
0.039
-0.697
-0.283
-0.159
0.453
-0.114
-0.449
Travels alone in the car
0.132
-0.214
-0.417
-0.572
0.069
-0.063
0.238
-0.112
Travels with others in the car
0.396
0.24
0.003
0.089
0.117
0.44
0.311
0.019
Leaving items in the car is important
0.044
0.274
0.093
0.229
0.419
0.486
-0.041
0.262
Familiarity with AV
0.021
0.218
0.087
0.602
0.057
-0.134
-0.144
0.239
Has ridden AV
0.347
-0.081
0.216
-0.702
0.642
0.412
0.204
-0.059
Purchase Cost (Regular Car)
0.848
0.17
0.751
-0.087
1.022
0.374
0.611
0.023
Purchase Cost (Private AV)
-1.126
-0.34
-0.93
-0.132
-1.264
-0.37
-0.955
-0.259
Membership Cost (Shared AV)
0.067
-0.26
0.08
-0.218
0.096
-0.293
0.072
-0.265
Travel Cost (Regular Car)
0.372
0.254
0.445
0.411
0.326
0.334
0.333
-0.067
Travel Cost (Private AV)
-0.361
-0.106
-0.206
-0.18
-0.428
-0.12
-0.403
0.06
Travel Cost (Shared AV)
0.012
-0.447
-0.035
-0.385
0.071
-0.461
0
-0.43
Parking Charge (Regular Car)
0.205
0.046
0.227
-0.12
0.189
-0.025
0.203
0.097
Parking Charge (Private AV)
-0.227
-0.066
-0.165
-0.021
-0.255
-0.062
-0.234
-0.016
Attitudinal_1
0.045
-1.065
-0.102
1.32
0.402
-0.871
0.785
-0.385
Attitudinal_2
2.124
1.306
-2.103
-1.1
-2.382
-2.002
0.827
1.11
Coefficients at 99% confidence level
Coefficients at 95% confidence level
Coefficients at 90% confidence level
The second experimental design consists of stated-preference choice scenarios developed
using the values of the average cost of a car, travel time for commute and parking costs
reported by respondents. As discussed previously, we adopted the stated preference from
P a g e | XXIV
Haboucha, Ishaq and Shiftan [40], and the estimated coefficients for the variables related to
the experiment are consistent with their findings. However, unlike their approach, we
compared the absolute values of the purchase cost for “Regular Car” and PAV. As was
observed by Haboucha, Ishaq and Shiftan [40], an increase in purchase cost for regular cars
[Prop, Ver_LKOE, Ver_OE] and private AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE]
negatively affects the preference for the corresponding types and a similar trend is observed
for the subscription cost for Shared AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE]. An increase
in the travel cost for regular cars increases the preference for AVs (both private and shared)
[Prop, Ver_LK, Ver_LKOE] that for Private AVs causes a decrease in the likelihood of choice
for both versions of AVs [Prop, Ver_LK, Ver_LKOE] and an increase in the travel cost for
Shared AVs decreases the preference for AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE].
Similarly, an increase in the parking charges for regular cars has a positive correlation with
the use of AVs [Prop, Ver_OE] and an increase in parking charges for AVs has the opposite
effect [Prop, Ver_LK, Ver_LKOE, Ver_OE]. The direction of the estimated coefficients for
travel time and parking charges also aligns with the findings of Haboucha, Ishaq and Shiftan
[40].
Attitudinal characteristics- in this research, attitudes were estimated as a latent variable,
which was later included as an explanatory variable in the utility equation for mode choice.
Hence, while assessing the effects of attitudes in Table 10.3 and Table 10.4, we consider the
signs and magnitudes of variables “Attitudinal_1” and “Attitudinal_2” from Table 10.2.
The notion that it is easy to learn to use AVs is positively associated with the use of AVs
(private and shared), so is the perception that it is easy to use AVs. Positive emotions towards
“it will be easy to get AVs to do what I want them to do” and “easy to become skilful at using
AVs” is, however, positively associated mainly with the use of Shared AVs. When it comes
to “Perceived Usefulness”, we find a positive correlation between the positive attitudes and
the use of Private AVs for all the questions. However, when it comes to using Shared AVs,
only the usefulness in meeting travel needs, performing other tasks during travel and the
usefulness when impaired were found to influence. The coefficients for the various Likert
scale responses are presented below in Table 10.3.
Table 10.3 Estimated Coefficients for Likert Scale Responses
Ver_LK
Ver_LKOE
Prop
Ind
Prop
Ind
Coe_1
Coe_2
Coe_1
Coe_2
Coe_1
Coe_2
Coe_1
Coe_2
P a g e | XXV
Constant_1
0.423
-0.389
0.045
0.542
0.682
-0.771
0.212
-0.232
Learning to use Autonomous Vehicles will be easy for me (PEoU_1)
PEoU_11
0.259
-0.026
-0.31
0.499
-0.022
0.218
0.126
0.498
PEoU_12
0.483
0.115
-0.306
0.052
0.191
0.134
0.189
0.068
PEoU_13
-0.027
0.218
0.118
-0.053
0.044
0.087
0.026
0.05
PEoU_14
0.105
0.311
0.073
-0.165
-0.037
0.312
-0.031
-0.127
PEoU_15
0.038
0.245
0.104
-0.05
-0.281
0.39
-0.115
-0.35
I will find it easy to get Autonomous Vehicles to do what I want them to do (PEoU_2)
PEoU_21
0.072
-0.013
-0.497
0.059
-0.118
0.05
0.046
0.25
PEoU_22
-0.192
-0.276
0.148
0.049
-0.434
-0.212
-0.372
0.264
PEoU_23
-0.087
-0.286
0.146
0.11
-0.073
-0.151
0.004
0.293
PEoU_24
-0.319
-0.034
0.382
-0.167
-0.294
-0.021
-0.135
0.078
PEoU_25
-0.144
-0.668
0.246
0.502
0.233
-0.148
0.259
0.35
It will be easy for me to become skilful at using Autonomous Vehicles (PEoU_3)
PEoU_31
-0.51
-0.365
0.171
0.212
-0.291
0.252
-0.42
-0.024
PEoU_32
-0.798
-0.551
0.614
0.015
0.287
-0.389
0.096
0.179
PEoU_33
-0.038
-0.413
-0.14
-0.162
0.39
-0.282
0.166
0.034
PEoU_34
-0.059
-0.414
-0.139
-0.122
0.378
-0.341
0.169
0.128
PEoU_35
0.205
-0.128
-0.443
-0.432
-0.114
-0.371
-0.135
0.003
I will find Autonomous Vehicles easy for me (PEoU_4)
PEoU_41
-0.223
0.544
0.67
0.176
0.013
-0.084
-0.141
0.377
PEoU_42
0.789
0.044
-0.531
0.098
-0.164
-0.06
-0.001
-0.082
PEoU_43
0.24
-0.109
-0.079
0.228
-0.268
-0.083
-0.186
-0.041
PEoU_44
0.089
0.115
0.079
0.017
0.037
-0.243
0.027
0.184
PEoU_45
0.511
0.413
-0.27
-0.418
0.95
-0.02
0.772
0.296
Using Autonomous Vehicles will be useful in meeting my travel needs (PU_1)
PU_11
-0.023
-0.54
0.111
1.031
0.182
-0.248
0.072
1.298
PU_12
0.433
-0.133
-0.304
0.457
0.361
-0.099
0.311
1.027
PU_13
0.182
0.295
-0.075
-0.114
-0.294
0.67
-0.178
-0.085
PU_14
0.311
0.757
-0.161
-0.62
-0.384
1.27
-0.063
-0.66
PU_15
0.296
0.804
-0.161
-0.65
-0.713
1.227
-0.325
-0.701
Autonomous Vehicles will let us do other tasks such as eating, watching a movie, be on a cell phone during my trip
(PU_2)
PU_21
-0.4
0.085
0.307
-0.188
0.052
0.265
0.326
0.009
PU_22
-0.23
-0.307
0.096
0.168
-0.374
0.249
-0.231
-0.314
PU_23
0.217
-0.214
-0.405
0.009
-0.186
0.136
0.021
-0.059
PU_24
0.078
-0.158
-0.231
-0.014
-0.246
0.148
-0.04
-0.129
PU_25
-0.042
-0.293
-0.035
0.084
-0.338
0.191
-0.146
-0.131
Using Autonomous Vehicles will decrease my accident risk (PU_3)
PU_31
0.432
0.142
-0.717
0.274
-0.047
-0.174
-0.098
0.685
PU_32
-0.196
0.345
0.033
0.032
-0.203
0.085
-0.28
0.486
PU_33
0.079
0.479
-0.197
-0.132
0.059
0.363
0.052
0.248
PU_34
-0.313
0.623
0.265
-0.302
-0.326
0.891
-0.174
-0.451
PU_35
-0.287
0.751
0.233
-0.386
-0.084
0.755
0.153
-0.216
Using Autonomous Vehicles will relieve my stress of driving (PU_4)
PU_41
0.255
-0.213
-0.347
0.29
0.163
-0.311
-0.041
0.563
PU_42
-0.232
-0.363
0.007
0.528
0.103
-0.166
-0.038
0.459
PU_43
-0.216
0.063
0.081
0.059
-0.086
0.155
0.048
0.037
P a g e | XXVI
PU_44
-0.515
0.386
0.337
-0.254
-0.08
0.007
-0.022
0.177
PU_45
-0.706
0.266
0.658
-0.235
-0.136
0.429
0.103
-0.447
I find Autonomous Vehicles to be useful when I’m impaired (PU_5)
PU_51
-0.015
-0.056
-0.185
-0.022
-0.085
-0.124
0.094
0.388
PU_52
0.173
-0.081
-0.451
-0.099
0.106
-0.047
0.239
0.021
PU_53
-0.112
-0.212
-0.015
0.104
-0.091
0.089
0.152
-0.119
PU_54
-0.321
0.168
0.118
-0.358
-0.209
0.155
0.031
-0.302
PU_55
-0.673
0.352
0.483
-0.512
-0.6
0.24
-0.318
-0.455
Coefficients at 99% confidence level
Coefficients at 95% confidence level
Coefficients at 90% confidence level
Attitudes were measured using open-ended responses collected from the respondents
answering Ver_OE of the questionnaire (refer to Table 10.4 for the estimated coefficients).
As discussed previously in Section 4.5.2.2, we extracted four topics for the open-ended
question related to the “Perceived Ease of Use of AVs” and seven topics for the “Perceived
Usefulness of AVs.” As expected, the easiness of use of AV (To_L11) is likely to influence
the choice of Private AVs. Topics that discussed the easiness in operation (To_L13),
navigation (To_L14) and the need for human presence (To_L12) are related more to the use
of Shared AVs. The ability to multi-task (work) (To_L22) is associated with the use of Private
AVs; however, the need for additional attention, which may cause distractions to work
(To_L26), influences the choice of Shared AVs negatively but has mixed reactions for Private
AVs (attitudes has 2 dimensions). The perception that it might save time and make travel
more environmentally friendly (To_L21) and it makes travel safer and mitigate congestion
(To_L23) is positively related to the use of both types of AVs. AVs ensuring mobility for the
disabled (To_L24) is linked positively with the use of Shared AVs. Other aspects of AVs'
usefulness discussed by the respondents include making parking easier (To_L27) and the need
for human control (To_L25), and individuals discussing these are more likely to prefer Shared
AVs.
Table 10.4 Estimated Coefficients for the Topics
Ind
Prop
Coeff_1
Coeff_2
Coeff_1
Coeff_2
Constant_1
0.005
0.145
0.048
-0.052
To_L11
0.038
0.055
0.086
0.063
To_L12
-0.071
-0.083
-0.198
-0.087
To_L13
-0.02
0.127
0.12
0.28
To_L14
-0.018
-0.059
-0.049
-0.06
To_L21
-0.07
0.047
-0.019
0.148
To_L22
0.052
0.263
0.351
0.318
To_L23
-0.075
0.155
0.081
0.321
To_L24
-0.005
-0.046
-0.059
-0.059
P a g e | XXVII
To_L25
-0.133
0.027
-0.103
0.15
To_L26
0.036
-0.18
-0.155
-0.342
To_L27
-0.039
0.028
-0.002
0.059
Coefficients at 99% confidence level
Coefficients at 95% confidence level
Coefficients at 90% confidence level
P a g e | XXVIII
P a g e | XXIX
APPENDIX F
11 RESULTS FOR THE PROPOSED FRAMEWORK
We present the results of the mapping (average of the generated values) between the Likert
scale responses and open-ended responses in Table 11.1. Referring to rows 4-14 and rows 17-
27, one can obtain the results for Ver_LK and Ver_LKOE of the questionnaire, respectively.
We use abbreviations for the various levels of the Likert scale responses (SD- Strongly
Disagree, Disag- Disagree, Neut- Neutral, Agree- Agree and SA- Strongly Agree). Table 11.2
presents the mapping between the observed topic proportions for open-ended questions and the
generated averages for Likert scale responses for Ver_LK and Ver_LKOE.
For the Likert scale responses, one could directly map the observed and the generated averages
for Ver_LK and Ver_LKOE of the questionnaire. However, for the topic proportions of the
open-ended questions, readers should not expect a direct correspondence between Likert scale
questions and topics in the same row (for instance, there is no direct correspondence between
AV_LeaEa and L11 or between AV_WrkEa and L12).
P a g e | XXX
Table 11.1 Mapping of the Likert Scale Responses for Ver_LK and Ver_LKOE
SD
Disag
Neut
Agree
SA
SD
Disag
Neut
Agree
SA
Top
#
Top
Prop
(%)
Word_1
Word_2
Word_3
Word_4
Word_5
Attitudes estimated for Ver_LK using the Proposed Model
Ver_LK (Observed Averages)
Ver_LKOE (Generated Averages)
Ver_OE (Generated Topic Proportions)
AV_LeaEa
5.79
10.04
35.68
35.67
12.82
10.63
23.51
7.71
15.00
26.13
L11
13.95
use
easi
technolog
work
get
AV_WrkEa
5.27
13.07
43.59
30.30
7.77
10.94
16.62
16.90
10.85
27.66
L12
27.47
drive
road
human
mani
accid
AV_SklEa
4.60
10.07
32.08
41.59
11.66
9.28
14.13
12.91
12.16
34.50
L13
49.98
drive
control
driver
make
easier
AV_UseEa
4.40
10.32
36.27
39.22
9.79
12.49
8.51
11.68
36.37
13.92
L14
9.91
oper
go
everyth
assum
user
AV_TrNed
7.00
12.04
33.63
34.23
13.09
6.69
29.73
6.38
22.89
17.28
L21
12.44
time
better
environ
make
save
AV_OthAc
11.57
15.57
26.08
34.87
11.91
42.92
13.30
7.44
8.14
11.18
L22
21.35
drive
driver
thing
work
make
AV_DeAcc
12.42
22.34
37.29
21.56
06.39
9.93
21.87
16.37
21.41
13.39
L23
17.63
accid
human
traffic
reduc
help
AV_ReStr
12.74
22.66
27.70
27.48
9.42
13.69
12.13
8.30
27.55
21.31
L24
3.53
drive
use
get
help
disabl
AV_UsImp
11.34
12.87
26.56
35.82
13.41
37.79
2.51
12.51
17.15
13.02
L25
22.78
use
drive
need
technolog
situat
L26
15.38
take
go
attent
pay
use
L27
6.89
driver
help
transport
safeti
safer
Attitudes estimated for Ver_LKOE using the Proposed Model
Ver_LKOE (Observed Averages)
Ver_LK (Generated Averages)
Ver_OE (Generated Topic Proportions)
AV_LeaEa
6.16
10.01
29.63
37.16
17.03
12.69
16.38
16.10
18.80
16.04
L11
12.34
use
easi
technolog
work
get
AV_WrkEa
6.44
12.91
40.49
31.52
8.64
8.30
10.17
7.44
32.63
21.46
L12
24.90
drive
road
human
mani
accid
AV_SklEa
5.56
8.24
27.23
42.34
16.62
7.40
12.14
22.05
21.06
17.34
L13
52.40
drive
control
driver
make
easier
AV_UseEa
5.74
10.58
29.83
39.70
14.15
31.25
16.03
9.56
5.17
17.99
L14
10.36
oper
go
everyth
assum
user
AV_TrNed
10.20
11.77
25.67
33.87
18.49
16.78
36.30
5.99
9.92
11.00
L21
11.89
time
better
environ
make
save
AV_OthAc
11.12
13.64
21.60
33.72
19.91
19.46
17.57
16.94
10.18
15.85
L22
21.50
drive
driver
thing
work
make
AV_DeAcc
14.47
17.13
29.96
25.54
12.90
24.32
8.91
14.99
15.07
16.71
L23
18.40
accid
human
traffic
reduc
help
AV_ReStr
14.53
18.34
22.15
31.02
13.95
10.32
32.76
4.55
18.10
14.27
L24
3.63
drive
use
get
help
disabl
AV_UsImp
12.13
12.58
19.50
32.96
22. 83
9.84
3.84
41.61
7.99
16.72
L25
21.12
use
drive
need
technolog
situat
L26
16.84
take
go
attent
pay
use
L27
6.62
driver
help
transport
safeti
safer
P a g e | XXXI
Table 11.2 Mapping of the Likert Scale Responses for the Extracted Topics from Open-ended Responses
To
p #
Top
Prop
(%)
Word_1
Word_2
Word_3
Word_4
Word_5
SD
Disag
Neut
Agree
SA
SD
Disag
Neut
Agree
SA
Attitudes estimated for Ver_OE using the Proposed Model
Ver_OE (Observed Topic Proportions)
Ver_LK (Generated Averages)
Ver_LKOE (Generated Averages)
L11
9.68
use
easi
technolog
work
get
AV_LeaEa
13.28
18.58
16.01
19.05
16.05
10.27
23.93
7.74
14.52
26.52
L12
41.98
drive
road
human
mani
accid
AV_WrkEa
8.69
12.00
7.27
34.52
20.49
10.94
16.66
16.58
11.37
27.42
L13
37.60
drive
control
driver
make
easier
AV_SklEa
9.15
14.84
21.40
20.44
17.15
9.20
13.61
13.22
11.96
35.00
L14
10.74
oper
go
everyth
assum
user
AV_UseEa
30.69
17.76
9.52
5.39
19.62
12.44
8.51
11.70
36.17
14.17
L21
3.81
time
better
environ
make
save
AV_TrNed
16.49
37.79
6.79
10.63
11.28
6.85
29.86
6.20
22.07
18.01
L22
15.70
drive
driver
thing
work
make
AV_OthAc
21.36
19.38
16.80
9.80
15.65
40.71
13.89
7.60
8.72
12.06
L23
21.59
accid
human
traffic
reduc
help
AV_DeAcc
26.95
8.85
16.28
14.74
16.15
9.94
22.12
16.34
21.79
12.79
L24
8.57
drive
use
get
help
disabl
AV_ReStr
9.84
34.06
5.36
17.39
16.32
13.60
12.28
8.33
28.00
20.78
L25
12.50
use
drive
need
technolog
situat
AV_UsImp
10.10
4.35
42.96
8.26
17.31
36.60
2.75
12.13
16.79
14.70
L26
9.31
take
go
attent
pay
use
L27
28.52
driver
help
transport
safeti
safer
P a g e | XXXII
P a g e | XXXIII
APPENDIX G
12 PYTHON CODE FOR TOPIC MODELS
Topic Model Analysis
The code is written to extract topics from text data. The code supports performing the following analysis:-
1. Latent Dirichlet Allocation (LDA)
2. Supervised Latent Dirichlet Allocation (sLDA)
Latent Dirichlet allocation can be performed using Gensim or Tomotopy. Supervised LDA can be performed
using Tomotopy. The dependent variable can be linear or binary. Visualisations to evaluate the results from the
Topic Models, pyLDAvis to understand topics and the inter-topic distance.
Importing the Libraries
### ************************** Importing Packages ************************ ###
from __future__ import division
import re # regular expressions
import numpy as np # scientific computing
import pandas as pd # datastructures and computing
import pprint as pprint # better printing
import os
import os.path
# Gensim
import gensim
import gensim.corpora as corpora
from gensim.utils import simple_preprocess
from gensim.models import CoherenceModel
# Lemmatization
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
# Plotting tools
import pyLDAvis # interactive Topic Model visualisation
import pyLDAvis.gensim
import matplotlib.pyplot as plt
# Libraries for Topic Models
import sys
import tomotopy as tp
# fix random generator seed (for reproducibility of results)
np.random.seed(42)
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
Creating the list of Stop Words
# NLTK Stop words
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
stop_words.extend(['also', 'back', 'cant', 'come', 'could', 'done', 'dont', 'due', 'els', 'etc', 'hope', 'howev','know',
'let', 'like', 'may', 'mayb', 'much', 'must', 'new', 'non', 'one', 'other', 'plu', 'pretti', 'said',
'say', 'see', 'sinc', 'someon', 'someoth', 'therefor', 'today', 'want', 'well', 'would', 'ye', 'car',
'cars', 'think', 'autonomous', 'vehicle', 'vehicles','people', 'seem', 'seems', 'really', 'still',
'however', 'believe', 'right', 'truly', 'automatic', 'sound', 'sounds', 'general', 'become', 'total',
'totally', 'tell', 'something', 'anything', 'person', 'phone'])
Importing the Dataset
P a g e | XXXIV
The dataset preparation is quite important here. If the model used is simple LDA, only the text data is
mandatory. However, for the supervised LDA, it requires the response variable (dependent variable) along with
the text data.
### ************************** Importing Datasets ************************ ###
directo = "<folder_path>"
df = pd.read_excel(directo + "\\<filename>")
df.head()
|=======================================================================
# convert the content field in dataset into a list
data = df.AVO_EUsT.values.tolist()
resp = df.Resp.values.tolist()
data[:5]
resp[:5]
Cleaning the Dataset
In this section of the code, the following data cleaning techniques are used:-
1. E-mail id and Newline characters
2. Remove ''StopWords'' from the dataset
3. Forming bigrams and trigrams
4. Stemming
### ************************** Datasets Cleaning ************************* ###
E-mail id and New line characters
# Remove Emails
data = [re.sub('\S*@\S*\s?', '', sent) for sent in data]
# Remove new line characters
data = [re.sub('\n', ' ', sent) for sent in data]
pprint.pprint(data[:5])
Replacing Phrases with Meaningful Words
data_1 = []
p1 = re.compile("(do\s*not\s*trust|don't\s*trust|don't\s*fully\s*trust|would\s*\not*\s*trust|never\s*trust)")
p2 = re.compile("unsafe|not\s*safe|not\s*feel\s*safe|not\s*be\s*safe|doesn't\s*seem\s*very\s*safe|\
don't\s*feel\s*it's\s*safe")
p3 = re.compile("not\*feel")
p4 = re.compile("don't\*think")
for item in data:
data_1a = p1.sub("no_trust", item)
data_1b = p2.sub("unsafe", data_1a)
data_1c = p3.sub("not_feel", data_1b)
data_1d = p2.sub("dont_think", data_1c)
data_1.append(data_1d)
data = data_1
print(data[:5])
Removing StopWords
Function to remove StopWords
def remove_stopwords(texts):
"""
objective:
function to remove stopwords from the paragraph/sentence
uses the preprocess
input:
paragraph/sentences
output:
wordlist after the stopwords are removed
"""
P a g e | XXXV
return [[word for word in simple_preprocess(str(doc)) if word not in stop_words]
for doc in texts]
data_words_nostops = remove_stopwords(data)
data_words_nostops[:5]
Forming Bigrams and Trigrams
def make_bigrams(texts):
"""
objective:
takes the processed text- after preprocessing and stop word removal
input:
preprocessed text
output:
text with bigrams
"""
return [bigram_mod[text] for text in texts]
def make_trigrams(texts):
"""
objective:
generate trigrams for the text
input:
text with bigrams
output:
text with trigrams
"""
return [trigram_mod[bigram_mod[text]] for text in texts]
# Build functions to remove stopwords, bigram and trigram models- calibration dataset
bigram = gensim.models.phrases.Phrases(data, min_count=5, threshold=100)
trigram = gensim.models.phrases.Phrases(bigram[data], threshold=100)
# Passing the parameters to the bigram/trigram- calibration dataset
bigram_mod = gensim.models.phrases.Phraser(bigram)
trigram_mod = gensim.models.phrases.Phraser(trigram)
data_words_bigrams = make_bigrams(data_words_nostops)
data_words_bigrams[:5]
Lemmatization
ps = PorterStemmer()
data_lemmatized = []
for texts in data_words_bigrams:
data_lemmatized.append([ps.stem(doc) for doc in texts])
data_lemmatized[:5]
Writing the Files to the dataset
df['Cleaned_Data'] = data_lemmatized
df.head()
df.to_csv(directo + "\\Output_Q1_Words.csv")
LDA Model
# Defining the LDA Function
def lda_model(input_list, save_path):
"""
desc:
the function estimates the LDA model and outputs the estimated topics
input:
list with documents as responses
output:
prints the topics
words and their corresponding proportions
"""
mdl = tp.LDAModel(tw=tp.TermWeight.ONE, # Term weighting
min_cf=3, # Minimum frequency of words
P a g e | XXXVI
rm_top=0, # Number of top frequency words to be removed
k=4, # Number of topics
seed=42)
for n, line in enumerate(input_list):
ch = " ".join(line)
docu = ch.strip().split()
mdl.add_doc(docu)
mdl.burn_in = 10000
mdl.train(10000)
print('Num docs: ', len(mdl.docs), 'Vocab size: ', mdl.num_vocabs, 'Num words: ', mdl.num_words)
print('Removed words: ', mdl.removed_top_words)
print('Training...', file=sys.stderr, flush=True)
for i in range(0, 50000, 10):
mdl.train(100)
print('Iteration: {}\tLog-likelihood: {}'.format(i, mdl.ll_per_word))
print('Saving...', file=sys.stderr, flush=True)
mdl.save(save_path, True)
for k in range(mdl.k):
print('Topic #{}'.format(k))
for word, prob in mdl.get_topic_words(k):
print('\t', word, prob, sep='\t')
return mdl
Estimating the Topic Model
print('Running LDA')
lda_model = lda_model(data_lemmatized, 'test.lda_4_T.bin')
Supervised LDA
def slda_model(documents, dep_var, save_path):
"""
desc:
the function estimates the sLDA model and outputs the estimated topics
input:
list with documents as responses
dependent variable
output:
prints the topics
words and their corresponding proportions
"""
smdl = tp.SLDAModel(tw=tp.TermWeight.ONE, # Term weighting
min_cf=3, # Minimum frequency of words
rm_top=0, # Number of top frequency words to be removed
k=4, # Number of topics
vars=['b'], # Number of dependent variables
seed=42)
for row, pred in zip(documents, dep_var):
pred_1 = []
pred_1.append(pred)
ch = " ".join(row)
docu = ch.strip().split()
smdl.add_doc(words=docu, y=pred_1)
smdl.burn_in = 10000
smdl.train(10000)
# Printing the output statistics
print('Num docs: ', len(smdl.docs), 'Vocab size: ', smdl.num_vocabs, 'Num words: ', smdl.num_words)
print('Removed top words: ', smdl.removed_top_words)
print('Training...', file=sys.stderr, flush=True)
for i in range(0, 50000, 10):
smdl.train(100)
print('Iteration: {}\tLog-likelihood: {}'.format(i, smdl.ll_per_word))
P a g e | XXXVII
print('Saving...', file=sys.stderr, flush=True)
smdl.save(save_path, True)
for k in range(smdl.k):
print('Topic #{}'.format(k))
for word, prob in smdl.get_topic_words(k):
print('\t', word, prob, sep='\t')
return smdl
print('Running Supervised LDA')
slda_model = slda_model(data_lemmatized, resp, 'test.slda_4_T.bin')
Visualising the Results of LDA
pyLDAvis does not have a module that allows Topic Models estimated using Tomotopy to be used directly for
plotting the graphs. It however allows plotting after the following parameters are computed for each of the Topic
Models:-
1. Phi a. probabilities of each word(W) for a given topic(K) under consideration
b. is a K x W vector
2. theta
a. probability mass function over ``K'' topics for all the documents in the corpus (D)
b. is a D x K matrix
3. n(d) a. number of tokens for each document
4. vocab
a. vector of terms in the vocabulary
b. presented in the same order as in ``phi''
5. M(w)
a. frequency of term ``w'' across the entire corpus
Computing the value of ``Phi'' for the Model
def compute_phi(model):
"""
desc:
this function computes the value of phi for visualising the results of Topic Model
probabilities of each word for a given topic
input:
the Topic Model
output:
K x W vector
K = number of topics
W = number of words
"""
mat_phi1 = []
for i in range(model.k):
mat_phi1.append(model.get_topic_words(i,model.num_vocabs))
list_words = []
for text in mat_phi1[0]:
list_words.append(text[0])
list_words.sort()
mat_phi2 = [[i * j for j in range(model.num_vocabs)] for i in range(model.k+1)]
for i in range(model.num_vocabs):
mat_phi2[0][i] = list_words[i]
j1 = []
k1 = []
m = 0
while m < model.k:
j1.append(m)
m += 1
n = 1
while n <= model.k:
P a g e | XXXVIII
k1.append(n)
n += 1
for j, k in zip(j1, k1):
for index, word in enumerate(mat_phi2[0]):
#print(word)
for item in mat_phi1[j]:
#print(item)
if word == item[0]:
mat_phi2[k][index] = item[1]
if os.path.isfile(directo + '\\topic_word_prob_lda_4_T.csv'):
with open(directo + '\\topic_word_prob_slda_4_T.csv', 'w') as f:
for item in mat_phi2:
for items in item:
f.writelines("%s, " % items)
f.writelines("\n")
f.close()
else:
with open(directo + '\\topic_word_prob_lda_4_T.csv', 'w') as f:
for item in mat_phi2:
for items in item:
f.writelines("%s, " % items)
f.writelines("\n")
f.close()
return mat_phi2[0], mat_phi2[1:]
Computing the value of ``Theta'' for the Model
For LDA Model
def compute_theta_lda(model, data):
"""
desc:
this function computes the value of theta for visualising the results of Topic Model
probabilities mass function over "K" topics for all documents (D) in the corpus
input:
the Topic Model
dataset
output:
D x K vector
D = number of documents
K = number of topics
"""
mat_theta = []
for n, line in enumerate(data):
ch = " ".join(line)
docu = ch.strip().split()
theta_val = model.infer(doc=model.make_doc(docu),
iter=100,
workers=0,
together=False)
mat_theta.append(theta_val[0])
with open(directo + '\\topic_probabilities_lda_4_T.csv', 'w') as f:
for item in mat_theta:
for items in item:
f.writelines("%s, " %items)
f.writelines("\n")
f.close()
return mat_theta
For sLDA Model
def compute_theta_slda(model, data, dep_var):
"""
desc:
P a g e | XXXIX
this function computes the value of theta for visualising the results of Topic Model
probabilities mass function over "K" topics for all documents (D) in the corpus
input:
the Topic Model
dataset
dependent variable
output:
D x K vector
D = number of documents
K = number of topics
"""
mat_theta = []
for line, dep in zip(data, dep_var):
pred_1 = []
pred_1.append(dep)
ch = " ".join(line)
docu = ch.strip().split()
theta_val = model.infer(doc=model.make_doc(words=docu, y=pred_1),
iter=100,
workers=0,
together=False)
mat_theta.append(theta_val[0])
with open(directo + '\\topic_probabilities_slda_4_T.csv', 'w') as f:
for item in mat_theta:
for items in item:
f.writelines("%s, " %items)
f.writelines("\n")
f.close()
return mat_theta
Number of Tokens per document
def num_token(data):
"""
desc:
this function computes number of tokens per document for the entire corpus
input:
dataset
output:
N x 1 vector
N = number of tokens in the document
"""
numb_tok = []
for text in data:
numb_tok.append(len(text))
return numb_tok
Frequency of Words in the Corpus
def freq_words(vocabs, data):
"""
desc:
this function computes the frequency of words in the entire corpus
input:
list of words
dataset
output:
N x 1 vector
N = frequency of words in the document
"""
fre_words = []
for words in vocabs:
words_freq = 0
P a g e | XL
for line in data:
for ind_words in line:
if words == ind_words:
words_freq += 1
fre_words.append(words_freq)
return fre_words
Visualising the Results of LDA Model
Computing the Parameters for Visualising LDA Model
# Loading the LDA model
lda_model = tp.LDAModel.load('test.lda_4_T.bin')
#lda_model.get_topic_word_dist(2)
lvocab, lphi_val = compute_phi(lda_model)
ltheta_val = compute_theta_lda(lda_model, data_lemmatized)
lnum_token = num_token(data_lemmatized)
lfreq_terms = freq_words(lvocab, data_lemmatized)
Plotting in pyLDAvis (LDA)
# Visualising the Results
pyLDAvis.enable_notebook()
data_lda = {'topic_term_dists': lphi_val,
'doc_topic_dists' : ltheta_val,
'doc_lengths' : lnum_token,
'vocab' : lvocab,
'term_frequency' : lfreq_terms}
print('Topic-Term shape: %s' % str(np.array(data_lda['topic_term_dists']).shape))
print('Doc-Topic shape: %s' % str(np.array(data_lda['doc_topic_dists']).shape))
vis_lda = pyLDAvis.prepare(**data_lda)
pyLDAvis.display(vis_lda)
Visualising the Results of Supervised LDA Model
Computing the Parameters for Visualising Supervised LDA Model
# Loading the sLDA model
slda_model = tp.SLDAModel.load('test.slda_4_T.bin')
svocab, sphi_val = compute_phi(slda_model)
stheta_val = compute_theta_slda(slda_model, data_lemmatized, resp)
snum_token = num_token(data_lemmatized)
sfreq_terms = freq_words(svocab, data_lemmatized)
Plotting in pyLDAvis (sLDA)
# Visualising the Results
pyLDAvis.enable_notebook()
data_slda = {'topic_term_dists': sphi_val,
'doc_topic_dists' : stheta_val,
'doc_lengths' : snum_token,
'vocab' : svocab,
'term_frequency' : sfreq_terms}
print('Topic-Term shape: %s' % str(np.array(data_slda['topic_term_dists']).shape))
print('Doc-Topic shape: %s' % str(np.array(data_slda['doc_topic_dists']).shape))
vis_slda = pyLDAvis.prepare(**data_slda)
pyLDAvis.display(vis_slda)
Computing Scores for use in Estimation
In this portion of the code, values are computed for each document in the corpus. The values are computed based
on the words used in each of the documents in the corpus. Scores will be computed for each topic. This will be
based on the probability values in each of the topics.
def compute_scores(list_dataset, list_word_prob):
"""
desc:
this function will take the cleaned dataset and list of word probabilities per topic and compute the scores
P a g e | XLI
input:
cleaned dataset as a list
word probabilities as a dataframe
output:
scores for each document in the corpus
"""
n = len(list_dataset)
prob_list = [[0 for i in range(5)] for i in range(n)]
for index, document in enumerate(list_dataset):
# remember to change the number of variables based on the number of topics
probab_1 = 0
probab_2 = 0
probab_3 = 0
probab_4 = 0
for word in document:
for index1, row in list_word_prob.iterrows():
item = row['Word']
prob1 = row['Prob_1']
prob2 = row['Prob_2']
prob3 = row['Prob_3']
prob4 = row['Prob_4']
if word == item:
probab_1 += prob1
probab_2 += prob2
probab_3 += prob3
probab_4 += prob4
prob_list[index][0] = probab_1
prob_list[index][1] = probab_2
prob_list[index][2] = probab_3
prob_list[index][3] = probab_4
prob_list[index][4] = probab_1 + probab_2 + probab_3 + probab_4
return prob_list
Computing the Scores for LDA
lda_dist = pd.read_csv(directo + "\\topic_word_prob_lda_4_T.csv", header=None)
lda_distT = lda_dist.T
lda_distT.columns = ['Word', 'Prob_1', 'Prob_2', 'Prob_3', 'Prob_4']
lda_distT['Word'] = lda_distT['Word'].str.strip()
lda_distT['Prob_1'] = pd.to_numeric(lda_distT.Prob_1, errors='coerce')
lda_distT['Prob_2'] = pd.to_numeric(lda_distT.Prob_2, errors='coerce')
lda_distT['Prob_3'] = pd.to_numeric(lda_distT.Prob_3, errors='coerce')
lda_distT['Prob_4'] = pd.to_numeric(lda_distT.Prob_4, errors='coerce')
probab_lda = compute_scores(data_lemmatized, lda_distT)
df['probab_lda'] = probab_lda
Computing the Scores for sLDA
slda_dist = pd.read_csv(directo + "\\topic_word_prob_slda_4_T.csv", header=None)
slda_distT = slda_dist.T
slda_distT.columns = ['Word', 'Prob_1', 'Prob_2', 'Prob_3', 'Prob_4']
slda_distT['Word'] = slda_distT['Word'].str.strip()
slda_distT['Prob_1'] = pd.to_numeric(slda_distT.Prob_1, errors='coerce')
slda_distT['Prob_2'] = pd.to_numeric(slda_distT.Prob_2, errors='coerce')
slda_distT['Prob_3'] = pd.to_numeric(slda_distT.Prob_3, errors='coerce')
slda_distT['Prob_4'] = pd.to_numeric(slda_distT.Prob_4, errors='coerce')
probab_slda = compute_scores(data_lemmatized, slda_distT)
df['probab_slda'] = probab_slda
df.to_csv(directo + "\\Open_Ended_Q1_Scores_4_Topic.csv")
P a g e | XLII
P a g e | XLIII
APPENDIX H
13 PYTHON CODE FOR THE FRAMEWORK TO MEASURE
ATTITUDES
Contents
1. Problem Description
2. Data Preparation
3. Probabilistic Graphical Model and the Generative Process
4. Proposed Model
Problem Description
We use the data on the mode choice for commute trips by students and workers from the USA. A stated-
preference (SP) survey was used. The questionnaire collected information on:-
1. Socio-demographic characteristics
2. Travel characteristics
3. Familiarity with Autonomous Vehicles
4. Attitudes
5. SP attributes
The attitudes were measured using 5-point Likert scales. For some attitudes, open-ended questions were also
presented to the respondents.
Importing the Libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import torch
torch.set_default_tensor_type("torch.cuda.FloatTensor")
import pyro
import pyro.distributions as dist
from pyro.contrib.autoguide import AutoDiagonalNormal, AutoMultivariateNormal
from pyro.infer import MCMC, NUTS, HMC, SVI, Trace_ELBO
from pyro.optim import Adam, ClippedAdam
# Cuda GPU resources
torch.cuda.set_device(0)
torch.cuda.requires_grad = True
# fix random generator seed (for reproducibility of results)
np.random.seed(42)
# matplotlib style options
plt.style.use('ggplot')
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)
# Reading files from the local drive
dfv1 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_1_Training_LDA.csv')
dfv2 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_2_Training_LDA.csv')
dfv3 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_3_Training_LDA.csv')
The mode is encoded as a integer from 0 to 2, corresponding to: -
0 - Regular Car
1 - Private Autonomous Vehicle
2 - Shared Autonomous Vehicle
P a g e | XLIV
Frequency Distribution of Mode Choice
def desc_stats(data, title):
print("Dataset size: ", len(data))
data['Choice'].hist();
plt.title(title)
plt.xlabel('Mode ID (0-regular car, 1 - Private Autonomous Vehicle, 2- Shared Autonomous Vehicle)')
plt.ylabel('Frequency')
plt.xticks([0,1,2]);
return
desc_stats(dfv1, "Mode Choice (Version_1)");
dfv1.describe()
desc_stats(dfv2, "Mode Choice (Version_2)");
dfv2.describe()
desc_stats(dfv3, "Mode Choice (Version_3)");
dfv3.describe()
# Concatenate Dataset
df = pd.concat([dfv1, dfv2, dfv3], ignore_index=True)
desc_stats(df, "Mode Choice (Combined)");
df.describe()
Data Preparation
For the Likert scale questions, we created dummy variables and for the open-ended questions, we extracted
variable using Latent Dirichlet allocation Method (Appendix ).
def data_processing(data):
"""
inp:
dataframe to be processed
desc:
create dummy variables for the discrete variables
standardise continuous variables
out:
processed dataframe
"""
# standardize input features
X_mean1 = data.iloc[:, [41, 42]].mean(axis=0)
X_std1 = data.iloc[:, [41, 42]].std(axis=0)
data.iloc[:, [41, 42]] = (data.iloc[:, [41, 42]] - X_mean1) / X_std1
X_mean2 = data.iloc[:, 78: 120].mean(axis=0)
X_std2 = data.iloc[:, 78: 120].std(axis=0)
data.iloc[:, 78: 120] = (data.iloc[:, 78: 120] - X_mean2) / X_std2
# Converting the ordered Attitudinal Responses into Dummy Variables
peu_names = ["AV_LeaEa", "AV_WrkEa", "AV_SklEa", "AV_UseEa"]
pu_names = ["AV_TrNed", "AV_OthAc", "AV_DeAcc", "AV_ReStr", "AV_UsImp"]
psr_names = ["AV_WoSaf", "AV_MaFal"]
ppr_names = ["AV_PeInf", "AV_UsInf", "AV_ShInf"]
tr_names = ["AV_Depe", "AV_Reli", "AV_Trus"]
at_names = ["AV_GIde", "AV_WIde", "AV_Plea"]
X_PEU = np.concatenate([pd.get_dummies(data[x]) for x in peu_names], axis=1).astype("float32")
X_PU = np.concatenate([pd.get_dummies(data[x]) for x in pu_names], axis=1).astype("float32")
X_PSR = np.concatenate([pd.get_dummies(data[x]) for x in psr_names], axis=1).astype("float32")
X_PPR = np.concatenate([pd.get_dummies(data[x]) for x in ppr_names], axis=1).astype("float32")
X_TR = np.concatenate([pd.get_dummies(data[x]) for x in tr_names], axis=1).astype("float32")
X_AT = np.concatenate([pd.get_dummies(data[x]) for x in at_names], axis=1).astype("float32")
# Grouping the Independent Variables into Different Sets
mat = data.values
X_SD = mat[:, [2,3,4,5,6,8,9,10,11,13,14,15,16,18,19,20,21,22,24,25,27,28,29]].astype("float32")
# Set of Socio-demographic Variables
X_TC = mat[:, [30,31,33,34,35,36,37,38,39,46,47,48,49]].astype("float32")
P a g e | XLV
# Set of Travel Characteristics
X_FAV = mat[:, [56,57]].astype("float32") # Familiarity with Autonomous Vehicles
X_SP = mat[:, 112:120].astype("float32") # SP Attribute Variables
X_TPEU = mat[:, 78:82].astype("float32") # Topics for Perceived Ease of Use
X_TPU = mat[:, 82:89].astype("float32") # Topics for Usefulness
X_TPSR = mat[:, 89:95].astype("float32") # Topics for Perceived Safety Risk
X_TPPR = mat[:, 95:101].astype("float32") # Topics for Perceived Privacy Risk
X_TTR = mat[:, 101:106].astype("float32") # Topics for Trust
X_TAT = mat[:, 106:112].astype("float32") # Topics for Attitudes
bern = mat[:, -2].astype("int") # Question_type
y = mat[:,-1].astype("int")
# Concatenating the variables
X_lk = np.concatenate([X_PEU, X_PU], axis=1)
X_oe = np.concatenate([X_TPEU, X_TPU], axis=1)
X_sd = np.concatenate([X_SD, X_TC, X_FAV, X_SP], axis=1)
return y, X_lk, X_oe, X_sd, bern
y, X_lk, X_oe, X_sd, bern = data_processing(df) # Processed data
bern1 = 1*(np.array([bern, bern]).transpose() == 1)
bern2 = 1*(np.array([bern, bern]).transpose() == 2)
bern3 = 1*(np.array([bern, bern]).transpose() == 3)
Proposed Model
Pyro Model for the Combined Version
def model(X_ls, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, obs=None):
# Coefficients for the LS model (Version_1)
alpha_ls1 = pyro.sample("alpha_ls1", dist.Normal(torch.zeros(K), torch.ones(K)))
gamma_ls1 = pyro.sample("gamma_ls1", dist.Normal(torch.zeros(X_ls.shape[1], K),
torch.ones(X_ls.shape[1], K)))
# Coefficients for the LS model (Version_2)
alpha_ls2 = pyro.sample("alpha_ls2", dist.Normal(torch.zeros(K), torch.ones(K)))
gamma_ls2 = pyro.sample("gamma_ls2", dist.Normal(torch.zeros(X_ls.shape[1], K),
torch.ones(X_ls.shape[1], K)))
# Coefficients for the OE model
alpha_oe = pyro.sample("alpha_oe", dist.Normal(torch.zeros(K), torch.ones(K)))
gamma_oe = pyro.sample("gamma_oe", dist.Normal(torch.zeros(X_oe.shape[1], K),
torch.ones(X_oe.shape[1], K)))
with pyro.plate("data", X_ls.shape[0], use_cuda=True):
y_att_ls1 = pyro.sample("y_att_ls1", dist.MultivariateNormal(alpha_ls1 + torch.matmul(X_ls,
gamma_ls1), torch.eye(K)), obs=None)
y_att_ls2 = pyro.sample("y_att_ls2", dist.MultivariateNormal(alpha_ls2 + torch.matmul(X_ls,
gamma_ls2), torch.eye(K)), obs=None)
y_att_oe = pyro.sample("y_att_oe", dist.MultivariateNormal(alpha_oe + torch.matmul(X_oe, gamma_oe),
torch.eye(K)), obs=None)
y_att = bern1 * y_att_ls1 + bern2 * y_att_ls2 + bern3 * y_att_oe
y_att = (y_att - torch.mean(y_att, dim=0))/torch.std(y_att, dim=0)
X_Data = torch.zeros(X_sd.shape[0], X_sd.shape[1] + 2)
X_Data[:, 0:-2] = X_sd
X_Data[:, -2] = y_att[:, 0]
X_Data[:, -1] = y_att[:, 1]
# Coefficients for the Classification model
alpha = torch.zeros(1, n_cat)
alpha_1 = pyro.sample("alpha_1", dist.Normal(torch.zeros(n_cat-1), torch.ones(n_cat-1)))
alpha[:, 1:] = alpha_1
beta = torch.zeros(X_Data.shape[1], n_cat)
beta_1 = pyro.sample("beta_1", dist.Normal(torch.zeros(X_Data.shape[1], n_cat-1),
torch.ones(X_Data.shape[1], n_cat-1)))
beta[:, 1:] = beta_1
with pyro.plate("data_final", X_Data.shape[0], use_cuda=True):
P a g e | XLVI
y = pyro.sample("y", dist.Categorical(logits= alpha + torch.matmul(X_Data, beta)), obs=obs)
return y
Preparing the tensors for the model (Proposed_Model)
n_cat = 3
K = 2
X_lk = torch.from_numpy(X_lk).float().cuda()
X_oe = torch.from_numpy(X_oe).float().cuda()
X_sd = torch.from_numpy(X_sd).float().cuda()
bern1 = torch.from_numpy(bern1).float().cuda()
bern2 = torch.from_numpy(bern2).float().cuda()
bern3 = torch.from_numpy(bern3).float().cuda()
y = torch.from_numpy(y).float().cuda()
Inference using SVI
%%time
# Define guide function
guide = AutoDiagonalNormal(model)
# Reset parameter values
pyro.clear_param_store()
# Define the number of optimization steps
n_steps = 4000
# Setup the optimizer
adam_params = {"lr": 0.01}
optimizer = ClippedAdam(adam_params)
# Setup the inference algorithm
elbo = Trace_ELBO(num_particles=3)
svi = SVI(model, guide, optimizer, loss=elbo)
# Do gradient steps
for step in range(n_steps):
elbo = svi.step(X_lk, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, y)
if step % 500 == 0:
print("[%d] ELBO: %.1f" % (step, elbo))
Upon convergence, we can use the Predictive class to extract samples from posterior:
from pyro.infer import Predictive
def summary(samples):
site_stats = {}
for k, v in samples.items():
site_stats[k] = {
"mean": torch.mean(v, 0),
"std": torch.std(v, 0),
"5%": v.kthvalue(int(len(v) * 0.05), dim=0)[0],
"95%": v.kthvalue(int(len(v) * 0.95), dim=0)[0],
}
return site_stats
predictive = Predictive(model, guide=guide, num_samples=2000,
return_sites=("alpha_ls1", "gamma_ls1", "alpha_ls2", "gamma_ls2", "alpha_oe", "gamma_oe",
"y_att_ls1", "y_att_ls2", "y_att_oe", "y_att", "alpha_1", "beta_1"))
samples = predictive(X_lk, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, y)
pred_summary = summary(samples)
pred_summary.items()
predictions = pd.DataFrame({
"alpha_ls1" : pred_summary["alpha_ls1"],
"gamma_ls1" : pred_summary["gamma_ls1"],
"alpha_ls2" : pred_summary["alpha_ls2"],
"gamma_ls2" : pred_summary["gamma_ls2"],
"alpha_oe" : pred_summary["alpha_oe"],
"gamma_oe" : pred_summary["gamma_oe"],
"y_att_ls1" : pred_summary["y_att_ls1"],
P a g e | XLVII
"y_att_ls2" : pred_summary["y_att_ls2"],
"y_att_oe" : pred_summary["y_att_oe"],
"alpha_1" : pred_summary["alpha_1"],
"beta_1" : pred_summary["beta_1"]
})
predictions.head()
predictions.to_csv('coeff.csv')
We can now use the inferred posteriors to make predictions for the test set and compute the corresponding
accuracy:
Prediction Accuracy
Reading the files
dft1 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_1_Test_LDA.csv')
dft2 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_2_Test_LDA.csv')
dft3 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_3_Test_LDA.csv')
y_v1, X_lk_v1, X_oe_v1, X_sd_v1, bern_v1 = data_processing(dft1)
y_v2, X_lk_v2, X_oe_v2, X_sd_v2, bern_v2 = data_processing(dft2)
y_v3, X_lk_v3, X_oe_v3, X_sd_v3, bern_v3 = data_processing(dft3)
# Coefficients for the LS model- Ver_LK
alpha_ls1 = samples["alpha_ls1"].cpu()
gamma_ls1 = samples["gamma_ls1"].cpu()
alpha_ls1_hat=np.array([np.mean(b, axis=0) for b in alpha_ls1.detach().numpy().T])
gamma_ls1_hat=np.array([np.mean(b, axis=1) for b in gamma_ls1.detach().numpy().T]).T
# Coefficients for the LS model- Ver_LKOE
alpha_ls2 = samples["alpha_ls2"].cpu()
gamma_ls2 = samples["gamma_ls2"].cpu()
alpha_ls2_hat=np.array([np.mean(b, axis=0) for b in alpha_ls2.detach().numpy().T])
gamma_ls2_hat=np.array([np.mean(b, axis=1) for b in gamma_ls2.detach().numpy().T]).T
# Coefficients for the OE model- Ver_OE
alpha_oe = samples["alpha_oe"].cpu()
gamma_oe = samples["gamma_oe"].cpu()
alpha_oe_hat=np.array([np.mean(b, axis=0) for b in alpha_oe.detach().numpy().T])
gamma_oe_hat=np.array([np.mean(b, axis=1) for b in gamma_oe.detach().numpy().T]).T
# Coefficients for the choice
alpha_1 = samples["alpha_1"].cpu()
beta_1 = samples["beta_1"].cpu()
alpha_1_hat=np.array([np.mean(b) for b in alpha_1.detach().numpy().T])
beta_1_hat=np.array([np.mean(b, axis=1) for b in beta_1.detach().numpy().T])
y_att_ls1 = samples["y_att_ls1"].cpu()
y_att_ls2 = samples["y_att_ls2"].cpu()
y_att_oe = samples["y_att_oe"].cpu()
y_att_ls1_hat=np.array([np.mean(b, axis=1) for b in y_att_ls1.detach().numpy().T]).T
y_att_ls2_hat=np.array([np.mean(b, axis=1) for b in y_att_ls2.detach().numpy().T]).T
y_att_oe_hat=np.array([np.mean(b, axis=1) for b in y_att_oe.detach().numpy().T]).T
bern1 = bern1.cpu()
bern2 = bern2.cpu()
bern3 = bern3.cpu()
y_att_hat = bern1.detach().numpy() * y_att_ls1_hat + bern2.detach().numpy() * y_att_ls2_hat +
bern3.detach().numpy() * y_att_oe_hat
np.savetxt("y_att_pred.csv", y_att_hat, delimiter=',')
Accuracy for Version_1 (Test Set)
y_hat_v1 = [None]*len(y_v1)
X_Data = np.zeros((X_sd_v1.shape[0], X_sd_v1.shape[1] + 2))
X_Data[:, 0:-2] = X_sd_v1
alpha_hat = np.zeros(n_cat)
alpha_hat[1:] = alpha_1_hat
beta_hat = np.zeros((n_cat, X_Data.shape[1]))
P a g e | XLVIII
beta_hat[1:, :] = beta_1_hat
y_att_v1 = [[None]*2]*len(y_v1)
for i in range(len(y_v1)):
y_att_v1[i] = alpha_ls1_hat + np.dot(X_lk_v1[i], gamma_ls1_hat)
X_Data[i, -2] = y_att_v1[i][0]
X_Data[i, -1] = y_att_v1[i][1]
y_hat_v1[i] = alpha_hat + np.dot(beta_hat, X_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_lk.csv', 'w+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(y_hat_v1)
y_hat_v1 = np.argmax(y_hat_v1, axis=1)
print("predictions:", y_hat_v1)
print("true values:", y_v1)
print(np.unique(y_hat_v1))
print(np.unique(y_v1))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat_v1 == y_v1) / len(y_v1))
Accuracy for Version_2 (Test Set)
y_hat_v2 = [None]*len(y_v2)
X_Data = np.zeros((X_sd_v2.shape[0], X_sd_v2.shape[1] + 2))
X_Data[:, 0:-2] = X_sd_v2
y_att_v2 = [[None]*2]*len(y_v2)
for i in range(len(y_v2)):
y_att_v2[i] = alpha_ls2_hat + np.dot(X_lk_v2[i], gamma_ls2_hat)
X_Data[i, -2] = y_att_v2[i][0]
X_Data[i, -1] = y_att_v2[i][1]
y_hat_v2[i] = alpha_hat + np.dot(beta_hat, X_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_lkoe.csv', 'w+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(y_hat_v2)
y_hat_v2 = np.argmax(y_hat_v2, axis=1)
print("predictions:", y_hat_v2)
print("true values:", y_v2)
print(np.unique(y_hat_v2))
print(np.unique(y_v2))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat_v2 == y_v2) / len(y_v2))
Accuracy for Version_3 (Test Set)
y_hat_v3 = [None]*len(y_v3)
X_Data = np.zeros((X_sd_v3.shape[0], X_sd_v3.shape[1] + 2))
X_Data[:, 0:-2] = X_sd_v3
y_att_v3 = [[None]*2]*len(y_v3)
for i in range(len(y_v3)):
y_att_v3[i] = alpha_oe_hat + np.dot(X_oe_v3[i], gamma_oe_hat)
X_Data[i, -2] = y_att_v3[i][0]
X_Data[i, -1] = y_att_v3[i][1]
y_hat_v3[i] = alpha_hat + np.dot(beta_hat, X_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_oe.csv', 'w+', newline ='')
# writing the data into the file
P a g e | XLIX
with file:
write = csv.writer(file)
write.writerows(y_hat_v3)
y_hat_v3 = np.argmax(y_hat_v3, axis=1)
print("predictions:", y_hat_v3)
print("true values:", y_v3)
print(np.unique(y_hat_v3))
print(np.unique(y_v3))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat_v3 == y_v3) / len(y_v3))
Accuracy for Version_1 (Training Set)
y, X_lk, X_oe, X_sd, bern = data_processing(dfv1)
y_hat = [None]*len(y)
X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))
X_tr_Data[:, 0:-2] = X_sd
alpha_tr_hat = np.zeros(n_cat)
alpha_tr_hat[1:] = alpha_1_hat
beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))
beta_tr_hat[1:, :] = beta_1_hat
y_att_tr = [[None]*2]*len(y)
for i in range(len(y)):
y_att_tr[i] = alpha_ls1_hat + np.dot(X_lk[i], gamma_ls1_hat)
X_tr_Data[i, -2] = y_att_tr[i][0]
X_tr_Data[i, -1] = y_att_tr[i][1]
y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_lk_tr.csv', 'w+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(y_hat)
y_hat = np.argmax(y_hat, axis=1)
print("predictions:", y_hat)
print("true values:", y)
print(np.unique(y_hat))
print(np.unique(y))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))
Accuracy for Version_2 (Training Set)
y, X_lk, X_oe, X_sd, bern = data_processing(dfv2)
y_hat = [None]*len(y)
X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))
X_tr_Data[:, 0:-2] = X_sd
alpha_tr_hat = np.zeros(n_cat)
alpha_tr_hat[1:] = alpha_1_hat
beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))
beta_tr_hat[1:, :] = beta_1_hat
y_att_tr = [[None]*2]*len(y)
for i in range(len(y)):
y_att_tr[i] = alpha_ls2_hat + np.dot(X_lk[i], gamma_ls2_hat)
X_tr_Data[i, -2] = y_att_tr[i][0]
X_tr_Data[i, -1] = y_att_tr[i][1]
y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_lkoe_tr.csv', 'w+', newline ='')
# writing the data into the file
with file:
P a g e | L
write = csv.writer(file)
write.writerows(y_hat)
y_hat = np.argmax(y_hat, axis=1)
print("predictions:", y_hat)
print("true values:", y)
print(np.unique(y_hat))
print(np.unique(y))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))
Accuracy for Version_3 (Training Set)
y, X_lk, X_oe, X_sd, bern = data_processing(dfv3)
y_hat = [None]*len(y)
X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))
X_tr_Data[:, 0:-2] = X_sd
alpha_tr_hat = np.zeros(n_cat)
alpha_tr_hat[1:] = alpha_1_hat
beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))
beta_tr_hat[1:, :] = beta_1_hat
y_att_tr = [[None]*2]*len(y)
for i in range(len(y)):
y_att_tr[i] = alpha_oe_hat + np.dot(X_oe[i], gamma_oe_hat)
X_tr_Data[i, -2] = y_att_tr[i][0]
X_tr_Data[i, -1] = y_att_tr[i][1]
y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])
# opening the csv file in 'w+' mode
file = open('utilities_comb_oe_tr.csv', 'w+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(y_hat)
y_hat = np.argmax(y_hat, axis=1)
print("predictions:", y_hat)
print("true values:", y)
print(np.unique(y_hat))
print(np.unique(y))
# evaluate prediction accuracy
print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))
P a g e | LI
APPENDIX I
14 PYTHON CODE FOR THE MAPPING OF RESPONSES
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
np.random.seed(42)
Function for Data Pre-processing
def data_preprocessing(len_dataset, ques_size):
"""
def:
pre-processing of data
inp:
len_dataset- length of the dataset
ques_size- array with number of topics/levels per question
out:
dataset with onehotencoded variables
"""
### Creating the dummy Dataset
len_dataset = len_dataset
num_ques = len(ques_size)
X_Act = np.array(np.zeros((len_dataset, num_ques)))
for i in range(num_ques):
X_Act[:, i] = np.random.choice(a=ques_size[i], size=(len_dataset, 1), p=[1/ques_size[i]]*ques_size[i]).T
var_name = ['X_' + str(i + 1) for i in range(num_ques)]
X_df = pd.DataFrame(data=X_Act, columns=var_name)
### OneHotEncoding of the Likert scale responses
enc = OneHotEncoder(handle_unknown='ignore')
enc_name = [0]*num_ques
for k in range(num_ques):
enc_name[k] = pd.DataFrame(enc.fit_transform(X_df[[var_name[k]]]).toarray())
enc_name[k].columns = ['X_'+str(k+1)+'_' + str(i+1) for i in range(ques_size[k])]
if k > 0:
enc_name[k] = enc_name[k-1].join(enc_name[k])
return enc_name[num_ques-1]
Function for Gibbs Sampling
def discrete_gibbs(Data, gamma_df, gamma_inv, alpha_df, ques_size, n_iter, n_warm):
"""
def:
function that performs Gibbs Sampling for discrete variables
inp:
Data- Dataset
gamma_df- Dataset of coefficients
gamma_inv- Dataset of the inverse coefficients
alpha_df- Dataset of constants
ques_size- array with number of topics/levels per question
n_iter- number of iterations
n_warm- number of warmup iterations to be discarded
out:
Dataset with sampled values
"""
num_ques = np.sum(ques_size)
y_arr = ['y_att_1', 'y_att_2']
np_arr = np.array([[[0.000]*(n_iter+n_warm)]*num_ques]*len(Data))
for j in range(n_iter):
P a g e | LII
print("Iterations: ", j)
for index, row in Data.iterrows():
n_dims = 0
for k in range(len(ques_size)):
V_na = ['X_'+str(k+1)+'_' + str(i+1) for i in range(ques_size[k])]
V_nn = V_na + y_arr
fir_pa = np.array([row['y_att_1'], row['y_att_2']]) - alpha_df -
np.matmul(np.array(gamma_df.drop(columns=V_na)), np.array(row.drop(V_nn, axis=0)))
inv_mc = np.array(gamma_inv.loc[V_na, :].T)
x_value = np.matmul(fir_pa, inv_mc)
sum_val = np.sum(np.abs(x_value))
x_val = np.abs(x_value)/sum_val
x_val = np.random.dirichlet(x_val)
for m in range(ques_size[k]):
np_arr[index, n_dims + m, j] = x_val[m]
n_dims += ques_size[k]
if j == n_iter/4:
print("25% iteration complete")
elif j == n_iter/2:
print("50% iteration complete")
elif j == 3*n_iter/4:
print("75% iteration complete")
elif j == n_iter-1:
print("Iteration complete")
return np_arr
Execution
def dis_sample(Dataset, gamma_df, gamma_inv, alpha_df, ques_size, num_iter, num_warm):
"""
def:
Perform discrete Gibbs sampling on a dataset
inp:
Dataset containing y_variables
gamma_df- data frame with the coefficients
gamma_inv- data frame with the inverse of the coefficients
ques_size- array with the number of topics/levels per question
num_iter- total number of iterations
num_warm- number of warm-ups
out:
Sampled dataset
"""
len_dataset = len(Dataset)
Exp_Var = pd.DataFrame(data_preprocessing(len_dataset,ques_size))
Dataset = Dataset.join(Exp_Var)
# Gibbs Sampling
Dataset = discrete_gibbs(Dataset, gamma_df, gamma_inv, alpha_df, ques_size, num_iter, num_warm)
Data_1 = Dataset[:, :, num_warm:]
Data_2 = np.mean(Data_1, axis=2)
print(Data_2.shape)
X_Variables = np.mean(Data_2, axis=0)
print(X_Variables)
return Dataset
# For Likert scale questions, set values
num_questions = 9
ques_size = [5]*num_questions
Dataset = pd.read_csv("y_att_pred_v1.csv", header=None)
Dataset.columns = ['y_att_1', 'y_att_2']
print(len(Dataset))
gamma_v = pd.read_csv("Gamma_ls2.csv", header=None)
alpha_v = np.array([0.6820, -0.7710]) # Update values
P a g e | LIII
col_names = []
for j in range(num_questions):
col_names.extend(['X_'+str(j+1)+'_' + str(i+1) for i in range(ques_size[j])])
gamma_v.columns = col_names
for i in range(num_questions):
V_na = ['X_'+str(i+1)+'_' + str(j+1) for j in range(ques_size[i])]
gamma_v1 = gamma_v[V_na]
if i == 0:
gamma_inv = pd.DataFrame((gamma_v1.T).dot(np.linalg.inv(gamma_v1.dot(gamma_v1.T))))
gamma_inv.index = V_na
else:
gamma_inv1 = pd.DataFrame((gamma_v1.T).dot(np.linalg.inv(gamma_v1.dot(gamma_v1.T))))
gamma_inv1.index = V_na
gamma_inv = gamma_inv.append(gamma_inv1)
gamma_inv.columns = ['y_att_1', 'y_att_2']
Data = dis_sample(Dataset, gamma_v, gamma_inv, alpha_v, ques_size, 2000, 400)
Data_1 = Data[:, :, 50:]
print(Data_1.shape)
Data_2 = np.mean(Data_1, axis=2)
print(Data_2.shape)
X_Variables = np.mean(Data_2, axis=0)
print(X_Variables)
import csv
with open('Data_ls2_2000.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerows(Data_2)
Article
Full-text available
Although the importance of meaning-making among suicide bereaved has been reported, the detailed contents of the process remain unclear. This study aimed to identify the content categories of sense-making and benefit-finding in Japanese suicide loss survivors. We conducted content analysis of responses to open-ended questions in 99 participants. The results indicated that sense-making activities comprised seven categories, including Deceased was relieved from pain and Suicide is inevitable in modern society. Benefit-finding also comprised eight categories, such as Treat others with compassion and Live one day at a time with gratitude. The implications of the results are discussed in terms of sociocultural contexts of suicide postvention.
Article
Full-text available
Autonomous vehicles (AVs) are expected to increase road safety and ensure that the mobility needs of elderly people are met. This study investigates the social acceptability for the diffusion of AVs in Japan. An Internet-based survey obtained results from 1,250 participants who were based in all the regions of Japan. Factor and cluster analyses were used to analyze the obtained data. The major findings suggest that respondents who totally disapproved of AVs diffusion were anxious about its potential negative impact on road safety. In contrast, respondents who totally approved of the technology’s diffusion into society felt they could use it in many different scenarios and that it would have positive social effects. Using our findings and by referring to previous methodology for promoting innovative products, we developed a number of policy recommendations that can be used to create social acceptability for the diffusion of AVs.
Article
Full-text available
Although automatically collected human travel records can accurately capture the time and location of human movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically, the activity-travel episodes of an individual user are treated as words in a document , and each topic is a distribution over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and continuous attributes-the location, start time of day, start day of week, and duration of each activity episode. The proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results show that the model can successfully distinguish the three most basic types of activities-home, work, and other, and it fits the data significantly better than rule-based approaches. As the specified number of activity categories increases, more specific subpatterns for home and work emerge. This work makes it possible to enrich human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules.
Article
Full-text available
Nurses are frequently required to engage in shift work given the 24/7 nature of modern healthcare provision. Despite the health and wellbeing costs associated with shift work, little is known about the types of coping strategies employed by nurses. It may be important for nurses to adopt strategies to cope with shift work in order to prevent burnout, maintain wellbeing, and ensure high quality care to patients. This paper explores common strategies employed by nurses to cope with shift work. A workforce survey was completed by 449 shift working nurses that were recruited from a major metropolitan health service in Melbourne, Australia. Responses to open-ended questions about coping strategies were analysed using the framework approach to thematic analysis. Four interconnected main themes emerged from the data: (i) health practices, (ii) social and leisure, (iii) cognitive coping strategies, and (iv) work-related coping strategies. Although a range of coping strategies were identified, sleep difficulties often hindered the effective use of coping strategies, potentially exacerbating poor health outcomes. Findings suggest that in addition to improving nurses’ abilities to employ effective coping strategies on an individual level, workplaces also play an important role in facilitating nurses’ wellbeing.
Article
For practical reasons, surveys that aim for a large number of respondents tend to restrict themselves to closed-ended responses. Despite potentially bringing richer insights, the use of open-ended questions poses great challenges in terms of extracting useful information while significantly increasing the analysis time. Nevertheless, automatic text analysis techniques speed up the analysis of open-ended responses. In this research, we explore the potential to use techniques in topic modelling [Latent Dirichlet Allocation (LDA) and Supervised LDA (sLDA)] to extract information from open-ended responses. This is compared to the information obtained from closed-ended responses, accomplished using a questionnaire that measures the intention to use shared autonomous vehicles (SAVs). Two versions of the questionnaire- Ver_OE and Ver_Lk were used, with open-ended and Likert scales measuring the same attitudes in the alternative versions. Factors were extracted for closed-ended questions. For questions common to both versions of the questionnaire, respondents answering Ver_OE had a higher positive attitude towards autonomous vehicles. These attitudinal questions were placed after the open-ended questions. When evaluating the performance of the models that predict the intention to use SAVs, models estimated using Ver_OE performed better. This increased further with the inclusion of the information extracted from the open-ended responses using both, the unsupervised (LDA) and supervised (sLDA) methods. No improvement was observed in the model for Ver_Lk. These indicate the potential for the use of open-ended questions to measure attitudes and topic modelling to extract information from these responses.
Article
The current study reports the results of open-ended questions from a follow-up survey of adults with sensory loss and their spouses who had previously taken part in an online study. In total, 111 participants completed the survey (72 adults with a sensory loss and 39 spouses). Open-ended questions asked about the overall experience of living with sensory loss, sensory loss-related challenges, and support and coping mechanisms. Thematic analysis was used to identify dominant themes in participants’ responses. Three core themes capturing their overall experience emerged: (1) sensory loss-related challenges, (2) support and coping, and (3) adjustment and readjustment. Sensory loss was characterized as a challenging experience, causing communication and emotional disturbances. Coping strategies reported by both partners included the use of assistive technology, positive re-appraisal, acceptance and/or denial of the loss, while support strategies were mostly derived from the comments of spouses (for AWSLs), family members and peer networks (for both partners). Finally, respondents described sensory loss as an adventurous learning experience. Our findings underscore the significance of considering sensory loss from a social relational/family perspective and highlight the importance of addressing the needs of both adults with sensory loss and their partners in treatment and rehabilitation. • Implications for rehabilitation • Study highlights the need to consider sensory loss from a relational/family perspective. • Healthcare professionals should try to increase the involvement of significant others and close family members (e.g., spouses, parents, children) into the rehabilitation process. • Greater emphasis should be placed on exploring and reinforcing positive experiences and attitudes associated with sensory loss during counselling/rehabilitation sessions. • Improved education about sensory loss for both the general public and health care professionals could minimize the adverse outcomes associated with sensory loss.
Article
Artificial intelligence arising from the use of machine learning is rapidly being developed and deployed by governments to enhance operations, public services, and compliance and security activities. This article reviews how artificial intelligence is being used in public sector for automated decision making, for chatbots to provide information and advice, and for public safety and security. It then outlines four public administration challenges to deploying artificial intelligence in public administration: accuracy, bias and discrimination; legality, due process and administrative justice; responsibility, accountability, transparency and explainability; and power, compliance and control. The article outlines technological and governance innovations that are being developed to address these challenges.
Article
For successful protected area (PA) management, it is essential to understand residents’ perceptions during the early phases of the designation process. However, most studies on residents’ perceptions have been conducted after PA designation due to the lack of researcher–policymaker cooperation. In this study, we reveal residents’ perceptions before the PA designation of Japan’s Amami Islands as a national park and a Natural World Heritage Site. We conducted a questionnaire survey using an open-ended question to collect textual answers on residents’ perceptions of nature and the local economic activities. We then categorized these answers into six topics by applying topic models and interpreted the topics qualitatively, indicating their content. We also examined the relativity of the topics and the islands using correspondence analysis. The residents were more interested in the landscapes relevant to their livelihoods and expected them to be managed. This result implies discrepancy between residents’ perceptions and the PA draft management plan because the draft plan mainly focuses on the conservation of biodiversity in subtropical rainforests, whereas residents were unfamiliar with this. With regard to the local economic activities, residents expected enhancement of agriculture and traditional craft industries and nature-based tourism. Furthermore, residents’ perceptions were probably influenced by the context of the islands on which they lived. We suggest adoption of suitable PA management and communication strategies for each island in view of residents’ perceptions. Our approach has enabled us to understand residents’ perceptions that have been disregarded through the PA designation process.
Article
Background: The Australian Dental Council's (ADC) competency framework requires graduating dental practitioners to be competent in a number of transferable skills, which includes: Being scientifically versed, technically skilled, and capable of safe independent and team work, while adhering to high ethical standards. 1 Part of the role of dental educators is to ensure graduating students acquire requisite transferable skills, in line with regulatory requirements. 2 In order to achieve this, it is imperative to assess students' own understanding or perception of transferable skills requirement upon graduation. The objective of this study is to develop a valid and reliable scale for this assessment. Method: A cohort of students drawn across three different dental programmes: Undergraduate Dentistry (year 1-3); Postgraduate Dentistry (year 4-5); Bachelor of Dental Technology/Prosthesis, participated in this study. A self-assessment questionnaire containing relevant open- and closed-ended questions was administered. The questionnaire assessed students' perception of transferable skills for their future career, and attitude towards learning and developing transferable skills. Result: In total we successfully assessed 388 of the 391 students sampled (99.2 % response rate), their mean age was 24.3 years (SD ± 5.7), 53.3% were females, while 46.7% were males. Overall, Exploratory Factor Analysis (EFA) extracted five factors for students' perception of current skill level, and four factors for future skills requirements. The factor structures were confirmed using Confirmatory Factor Analysis (CFA), the structure had a good model fit and high levels of reliability, with respect to individual dimension and content validity. Conclusions: The structure derived from the transferable skills survey administered to a cohort of dental students, suggests that the transferable skill survey can be utilised as a valid and reliable screening tool to test students' perception of transferable skill requirements.