ThesisPDF Available

Automated Text Analysis on Open-Ended Response Surveys: Measuring Attitudes Regarding Autonomous Vehicles

June 2021

June 2021

Thesis for: PhD Degree
Advisor: João de Abreu e Silva, Francisco Câmara Pereira

Authors:

Nissan Digital India LLP

For practical reasons, surveys that aim for a large number of respondents tend to restrict themselves to closed-ended responses. Despite potentially bringing richer insights, open-ended questions pose significant challenges in extracting useful information while significantly increasing the analysis time. Nevertheless, automatic text analysis techniques could speed up the analysis of open-ended responses. Furthermore, open-ended questions in conjunction with closed-ended questions are likely to influence the closed-ended responses. Considering this, we pursued the following four objectives in this thesis, a. to analyse if the method of collecting qualitative data influences the survey responses, b. to develop an approach to extract open-ended responses from a survey and process the data, c. to compare the relative performance of the open-ended and closed-ended responses in analysing qualitative data, d. to develop a framework that measures attitudes while allowing respondents to choose their preferred type of question (closed- or open-ended). This thesis analyses the suitability of using Topic Modelling to extract information from the open-ended responses to measure attitudes. As a case study throughout the whole thesis, we used questionnaires that collect information on the attitudes related to Autonomous Vehicles (AV). In this case study, alternative versions of the questionnaires that consider open- and/or closed-ended questions were presented randomly to respondents. Thus, two datasets were collected, a. 364 responses from India on the intention to use Shared AVs, b. 3002 responses from the USA on the intention to use AVs for commute trips. To quantify the relative benefits, we evaluated the relative performance of the alternative versions of the questionnaire to measure attitudes. In this regard, we assessed the predictive capability of the statistical models estimated using each of these independent datasets. Besides, the responses to the attitudinal questions are evaluated to analyse if the mode of asking questions influence the measured attitudes. Finally, having estimated the models, we developed a framework that measures attitudes by allowing respondents to choose their preferred type of question. Our results indicate that the use of open-ended questions before the set of Likert scale questions could alter the responses to the Likert scale questions. The consequence is a reduction in the number of neutral responses and an increase in positive attitude among those answering the questionnaire with open-ended questions. We also evaluated the suitability of using Topic Modelling techniques such as Latent Dirichlet Allocation and supervised Latent Dirichlet Allocation and found them effective. However, we could not find significant improvements in performance using the supervised approach. When comparing the predictive capabilities of the models estimated using questions that used Likert scale responses with and without open-ended questions, the performance of the models was superior for the dataset which had open-ended questions before the Likert scale responses. However, we could not find it beneficial to replace Likert scale questions with open-ended questions fully. Using the dataset collected from the USA, we proposed a modelling framework that allows researchers/analysts to let respondents answer the questionnaire using question types (closed- or open-ended questions) of their choice. The performance of the proposed model was superior to that of the individually estimated models, particularly for the test set. Index Terms—Topic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian Estimation

nter-topic Distance for a. OE1, b. OE2, c. OE3, d. OE4 3.5.3 Comparison of Closed-and Open-ended Responses

…

presents the variation of the responses to the questions related to overall attitudes. In contrast to the respondents answering Ver_LK of the questionnaire, an additional 10% of respondents answering Ver_LKOE considered AVs a good and wise idea. When asked if the participants considered AVs to be pleasant, most of the responses were neutral. Thus, the overall attitudes towards the use of AVs are very positive among the public.

…

Probabilistic Graphical Model for the Proposed Model

…

Predictions Using the Proposed Framework 4.9 CONCLUSION

…

5 Estimation Results of the Model for the "Intention to Use Shared AVs"

…

Figures - uploaded by Vishnu Baburajan

Content may be subject to copyright.

Content uploaded by Vishnu Baburajan

Content may be subject to copyright.

UNIVERSIDADE DE LISBOA

INSTITUTO SUPERIOR TÉCNICO

Automated Text Analysis on Open-Ended Response Surveys:

Measuring Attitudes Regarding Autonomous Vehicles

Vishnu Baburajan

Supervisor: Doctor João Antonio de Abreu e Silva

Co-Supervisor: Doctor Francisco Colunas Pereira da Câmara Pereira

Thesis approved in public session to obtain the PhD Degree in

Transportation Systems

Jury final classification: Pass with Distinction and Honour

2021

UNIVERSIDADE DE LISBOA

INSTITUTO SUPERIOR TÉCNICO

Automated Text Analysis on Open-Ended Response Surveys:

Measuring Attitudes Regarding Autonomous Vehicles

Vishnu Baburajan

Supervisor: Doctor João Antonio de Abreu e Silva

Co-Supervisor: Doctor Francisco Colunas Pereira da Câmara Pereira

Thesis approved in public session to obtain the PhD Degree in

Transportation Systems

Jury final classification: Pass with Distinction and Honour

Jury

Chairperson: Doctor Luís Guilherme de Picado Santos, Instituto Superior Técnico,

Universidade de Lisboa

Members of the Committee:

Doctor Francisco Colunas Pereira da Câmara Pereira, Department of Technology,

Management and Economics, Technical University of Denmark, Denmark

Doctor Luís Guilherme de Picado Santos, Instituto Superior Técnico,

Universidade de Lisboa

Doctor Filipe Manuel Mercier Vilaça e Moura, Instituto Superior Técnico,

Universidade de Lisboa

Doctor Catarina Helena Branco Simões da Silva, Faculdade de Ciências e

Tecnologia, Universidade de Coimbra

Doctor Maria Teresa Galvão Dias, Faculdade de Engenharia, Universidade de

Porto

Doctor Paulo Manuel da Fonseca Teixeira, Instituto Superior Técnico,

Universidade de Lisboa

Funding Institutions: Fundação para a Ciência e a Tecnologia, Portugal; European

Cooperation in Science and Technology, Cost Action TU-1305

2021

P a g e | I

ABSTRACT

For practical reasons, surveys that aim for a large number of respondents tend to restrict themselves to closed-

ended responses. Despite potentially bringing richer insights, open-ended questions pose significant challenges in

extracting useful information while significantly increasing the analysis time. Nevertheless, automatic text

analysis techniques could speed up the analysis of open-ended responses. Furthermore, open-ended questions in

conjunction with closed-ended questions are likely to influence the closed-ended responses.

Considering this, we pursued the following four objectives in this thesis, a. to analyse if the method of collecting

qualitative data influences the survey responses, b. to develop an approach to extract open-ended responses from

a survey and process the data, c. to compare the relative performance of the open-ended and closed-ended

responses in analysing qualitative data, d. to develop a framework that measures attitudes while allowing

respondents to choose their preferred type of question (closed- or open-ended).

This thesis analyses the suitability of using Topic Modelling to extract information from the open-ended responses

to measure attitudes. As a case study throughout the whole thesis, we used questionnaires that collect information

on the attitudes related to Autonomous Vehicles (AV). In this case study, alternative versions of the questionnaires

that consider open- and/or closed-ended questions were presented randomly to respondents. Thus, two datasets

were collected, a. 364 responses from India on the intention to use Shared AVs, b. 3002 responses from the USA

on the intention to use AVs for commute trips.

To quantify the relative benefits, we evaluated the relative performance of the alternative versions of the

questionnaire to measure attitudes. In this regard, we assessed the predictive capability of the statistical models

estimated using each of these independent datasets. Besides, the responses to the attitudinal questions are

evaluated to analyse if the mode of asking questions influence the measured attitudes. Finally, having estimated

the models, we developed a framework that measures attitudes by allowing respondents to choose their preferred

type of question.

Our results indicate that the use of open-ended questions before the set of Likert scale questions could alter the

responses to the Likert scale questions. The consequence is a reduction in the number of neutral responses and an

increase in positive attitude among those answering the questionnaire with open-ended questions. We also

evaluated the suitability of using Topic Modelling techniques such as Latent Dirichlet Allocation and supervised

Latent Dirichlet Allocation and found them effective. However, we could not find significant improvements in

performance using the supervised approach. When comparing the predictive capabilities of the models estimated

using questions that used Likert scale responses with and without open-ended questions, the performance of the

models was superior for the dataset which had open-ended questions before the Likert scale responses. However,

we could not find it beneficial to replace Likert scale questions with open-ended questions fully. Using the dataset

collected from the USA, we proposed a modelling framework that allows researchers/analysts to let respondents

answer the questionnaire using question types (closed- or open-ended questions) of their choice. The performance

of the proposed model was superior to that of the individually estimated models, particularly for the test set.

Index Terms—Topic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert

Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian

Estimation

P a g e | I

P a g e | II

RESUMO

Por questões práticas, os inquéritos direcionados a um grande número de participantes tendem a restringir-se a

perguntas fechadas. Embora podendo potencialmente fornecer perceções mais ricas, o uso de perguntas abertas

coloca grandes desafios em termos da extração de informação útil, ao mesmo tempo que aumenta o tempo de

análise. Não obstante, certas técnicas de análise automática de texto podem acelerar a análise de respostas abertas.

Além disso, o uso de perguntas abertas em conjugação com perguntas fechadas é suscetível que influencie as

respostas a estas últimas.

Tendo isto em consideração, esta tese seguiu os seguintes quatro objetivos, a. analisar se o método de colheita de

dados qualitativos influencia as respostas dos inquéritos, b. desenvolver uma abordagem para extrair respostas

abertas de inquéritos e processar os dados delas resultantes, c. comparar a performance relativa entre respostas

abertas e fechadas na análise de dados qualitativos e, por fim, d. desenvolver uma estrutura de modelação que

mede atitudes enquanto permite aos participantes optarem pelo tipo de pergunta, aberta ou fechada, que mais

lhes convém.

Esta tese analisa a aplicabilidade de Modelação de Tópicos na extração de informação proveniente de respostas a

perguntas abertas que medem atitudes. Enquanto estudo de caso usado ao longo de toda a tese, foram usados

questionários que coletam informação sobre as atitudes em relação a Veículos Autónomos (AV). Para conduzir

este estudo, versões alternativas dos questionários, constituídas por perguntas abertas e/ou fechadas, foram

apresentadas de forma aleatória aos participantes. No total, dois conjuntos de dados foram coletados, 1. 364

respostas provenientes da Índia sobre a intenção de uso de AV Partilhados e 2. 3002 respostas com origem nos

EUA sobre a intenção de uso de AV para viagens pendulares. Para quantificar os benefícios relativos,

procedemos à avaliação da performance relativa às versões alternativas dos questionários para a medição das

atitudes. A este respeito, os modelos estatísticos estimados separadamente em cada um destes conjuntos de dados

independentes serão avaliados com base na sua capacidade preditiva. Além disso, as respostas às perguntas

atitudinais serão avaliadas para analisar se a forma das perguntas influencia as atitudes que estão a ser medidas.

Após a estimação dos modelos, esta tese desenvolve adicionalmente uma abordagem de modelação que mede as

atitudes permitindo aos respondentes a escolha do tipo de questão da sua preferência.

Os nossos resultados indicam que uso de perguntas abertas antes do conjunto de perguntas em escala de Likert

poderá afectar as respostas a estas últimas. A consequência é a redução do número de respostas neutras e um

aumento na atitude positiva entre os participantes que responderam ao inquérito com perguntas abertas. Também

avaliámos a aplicabilidade do uso de técnicas de Modelação de Tópicos tais como a Alocação Latente de Dirichlet

e a sua versão supervisionada, tendo-se ambas mostrado ambas eficazes. Contudo, não conseguimos encontrar

melhorias significativas de performance no uso da abordagem supervisionada.

Ao comparar as capacidades preditivas dos modelos estimados das perguntas que usaram respostas na escala da

Likert com e sem perguntas abertas, a performance dos modelos foi superior para o conjunto de dados que

apresentava perguntas abertas antes das respostas na escala de Likert. Todavia, concluímos que não é benéfico

substituir totalmente as perguntas em escala Likert por perguntas abertas. Usando os dados provenientes dos EUA,

propusemos uma abordagem de modelação que permite aos investigadores/analistas oferecer aos participantes a

liberdade de escolha entre os dois tipos de perguntas (abertas e fechadas), em função das suas preferências. A

performance do modelo proposto revelou-se superior às dos modelos estimados individualmente, particularmente

para os dados de teste.

Index Terms—Topic Modelling, Latent Dirichlet Allocation, Supervised Latent Dirichlet Allocation, Likert

Scales, Open-ended Questions, Travel Behaviour Research, Model-based Machine Learning, Bayesian

Estimation

P a g e | III

P a g e | IV

ACKNOWLEDGEMENTS

I express my sincere gratitude to Prof. João de Abreu e Silva, my mentor, adviser and guide. I thank him for the

insightful conversations we had during the development of this research and my thesis. His expertise in travel

behaviour research, particularly in qualitative data analysis, has been of immense help in understanding the

concepts and in developing the ideas.

I want to express my sincere gratitude to my co-supervisor, Professor Francisco Câmara Pereira of the Technical

University of Denmark (DTU), for continuous support in my research and coursework. His expertise in Machine

Learning, specifically in Topic Modelling and the Model-based Approach to Machine Learning, has been

beneficial in developing this thesis.

This thesis is written as part of my PhD studies in the Instituto Superior Técnico (IST) and the Technical

University of Denmark (DTU), and I acknowledge IST and DTU for hosting me.

The research was funded by the MIT-Portugal Program of the Fundação para a Ciência e a Tecnologia, Portugal.

I sincerely thank the Portuguese Government for their kind support and funding. I want to express my sincere

gratitude to the Director Prof. Luís Guilherme de Picado Santos and Teresa Afonso, whose support helped shape

this research. The European Cooperation for Science and Technology Cost Action TU-1305 Social Networks and

Travel Behaviour supported a case study in my research.

I want to thank my friends and colleagues from IST, particularly Mohammad Sadegh, Mariza, Suresh, Jayanath,

Jayachandran, Joshin and Rahul. I would also like to thank all members of the Machine Learning for Smart

Mobility (MLSM) at DTU, particularly Francisco, Renming, Daniele, Inon, Sergio and Filipe Rodrigues.

I thank my parents (Baburajan and Sreekaladevi) for their constant love, support, and guidance. The values of

competence, persistence, and honesty they instilled in me has helped me immensely. The love and support from

my brothers Hariprasad, Narendraprasad, Ananthakrishnan, Nishanth, Unnikrishnan and sisters Indulekha,

Vandana, Sreelakshmi, Manjulakshmi and Arya have been phenomenal. So are the love and support from

Unnikrishnan, Salu and Ravindran. I also take this opportunity to extend my sincere thanks to Nimisha.

My friends Arunbabu, Meghal, Naveen, Manu, Sajjad, Dani and Chelza, supported me during my PhD. My stay

in Denmark was made memorable by Ashik, Sushanth, Sherin and his family, Smobin and Maria.

There were many ups and downs during my PhD and lectures of Prof. Jordan B. Peterson and Sadhguru have

helped me remain calm and pursue my research.

P a g e | V

P a g e | VI

Dedicated to

My Parents, Grandparents

and

My Little Bundle of Joy

P a g e | VII

P a g e  | VIII 
TABLE OF CONTENTS 
Abstract .................................................................................................................................... I 
Resumo ................................................................................................................................... II 
Acknowledgements ................................................................................................................ IV 
Table of Contents ................................................................................................................ VIII 
List of Figures ..................................................................................................................... XIV 
List of Tables ...................................................................................................................... XVI 
1 Introduction ....................................................................................................................... 1 
1.1 Introduction and Background .................................................................................... 1 
1.2 Objectives and Scope ................................................................................................. 2 
1.3 Overview of the Approach ......................................................................................... 3 
1.4 Thesis Contribution and Corresponding Stakeholders .............................................. 4 
1.5 Scientific Outcomes from this Thesis ........................................................................ 6 
1.6 Organisation of the Thesis ......................................................................................... 7 
2 Fundamentals Qualitative Data and Its Measurement ...................................................... 9 
2.1 Introduction ................................................................................................................ 9 
2.2 Measuring Attitudes in Transportation .................................................................... 10 
2.3 Measurement of Qualitative Data ............................................................................ 12 
2.3.1 Closed-ended Responses ..................................................................................... 12 
2.3.1.1 Types of Closed-ended Scales ..................................................................... 13 
2.3.1.2 Optimal Number of Points in a Scale .......................................................... 16 
2.3.1.3 Neutrality and its implications ..................................................................... 17 
2.3.1.4 Satisficing .................................................................................................... 17 

P a g e  | IX 
3.1.5 Other Issues .................................................................................................. 18 
3.1.6 Analysis of Closed-ended Responses .......................................................... 18 
3.2 Open-ended Responses ....................................................................................... 19 
3.2.1 Coding of Open-ended Questions ................................................................ 21 
3.2.2 Position of Open-ended Questions............................................................... 22 
3.2.3 Missing Values in Open-ended Surveys ...................................................... 22 
3.2.4 Extraction of Information from Open-ended Responses ............................. 23 
3.3 Concluding Remarks ........................................................................................... 24 
4 Frameworks to Measure the Intention to use ........................................................... 25 
5 Modelling Approaches to Predict AV Use .............................................................. 28 
6 Factors Influencing “Intention to use/Pay” For Autonomous Vehicles................... 29 
7 Natural Language Processing and Its Applications ................................................. 31 
8 Summary .................................................................................................................. 35 
Topic Modelling for Open-ended Responses – A Case Study on the Intention to Use Shared 
AVs .................................................................................................................................... 37 
1 Introduction .............................................................................................................. 37 
2 Questionnaire Design ............................................................................................... 38 
2.1 Experimental Design ........................................................................................... 38 
2.2 Framework Design .............................................................................................. 38 
3 Data Collection and Data Cleaning.......................................................................... 39 
4 Exploratory Analysis ............................................................................................... 40 
4.1 Preliminary Analysis ........................................................................................... 40 
4.1.1 Attitudes Towards AVs................................................................................ 42 
4.1.2 Subjective Norms ......................................................................................... 43 

P a g e  | X 
4.1.3 Perceived Behavioural Control Variables .................................................... 44 
4.1.4 Intention to Use Shared AVs ....................................................................... 45 
4.2 Statistical Analysis .............................................................................................. 46 
5 Extraction of Data .................................................................................................... 46 
5.1 Treatment of Closed-ended Responses ............................................................... 46 
5.2 Extraction of Information from Open-ended Responses .................................... 48 
5.2.1 Exploratory Analysis ................................................................................... 48 
5.2.2 Results from the Topic Models .................................................................... 49 
5.3 Comparison of Closed- and Open-ended Responses .......................................... 52 
6 Modelling Framework ............................................................................................. 53 
7 Estimation Results and Discussion .......................................................................... 54 
8 Conclusion ............................................................................................................... 57 
Integrating and Comparing Open- and Closed-ended Responses: A Case Study on AVs for 
Commute Trips ................................................................................................................... 59 
1 Introduction .............................................................................................................. 59 
2 Questionnaire Design ............................................................................................... 59 
2.1 Experimental Design ........................................................................................... 60 
2.2 Framework Design .............................................................................................. 62 
3 Data Collection and Data Cleaning.......................................................................... 62 
4 Exploratory Analysis ............................................................................................... 63 
4.1 Preliminary Analysis on Socio-demographic Characteristics ............................. 63 
4.1.1 Perceived Ease of Use .................................................................................. 67 
4.1.2 Perceived Usefulness ................................................................................... 67 
4.1.3 Perceived Safety Risk .................................................................................. 68 

P a g e  | XI 
4.1.4 Perceived Privacy Risk ................................................................................ 69 
4.1.5 Trust ............................................................................................................. 70 
4.1.6 Attitudes ....................................................................................................... 70 
4.1.7 Modal Share for Commute Trips ................................................................. 71 
4.2 Statistical Analysis .............................................................................................. 72 
4.3 In-Depth Analysis ............................................................................................... 72 
5 Extraction of Data .................................................................................................... 73 
5.1 Treatment of Closed-ended Responses ............................................................... 73 
5.2 Extraction of Information from Open-ended Responses .................................... 74 
5.2.1 Exploratory Analysis ................................................................................... 74 
5.2.2 Results from Topic Models .......................................................................... 75 
5.3 Comparison of Closed- and Open-ended Responses .......................................... 78 
6 Modelling Framework ............................................................................................. 80 
7 Estimation Results ................................................................................................... 84 
8 Proposed Framework to Model Attitudes Jointly .................................................... 86 
9 Conclusion ............................................................................................................... 88 
Conclusion ...................................................................................................................... 91 
1 Introduction .............................................................................................................. 91 
2 Data Description ...................................................................................................... 92 
3 Salient Findings ....................................................................................................... 92 
3.1 Influence of Questionnaire Type on the Responses ............................................ 92 
3.2 Approach to Extract Information from Open-ended Responses ......................... 93 
3.3 Evaluate the Relative Performance of Closed- and Open-ended Approaches .... 93 

P a g e  | XII 
5.3.4 A framework  to Measure Attitudes that  Allow Respondents  to Choose Their 
Preferred Questionnaire Type ...................................................................................... 94 
5.4 Limitations of the Current Study and Directions for Future Research .................... 94 
References .............................................................................................................................. 97 
Appendix A .............................................................................................................................. I 
6 Questionnaire: Intention to Use Shared AVs (India) ........................................................ I 
Appendix B .......................................................................................................................... VII 
7 Questionnaire: Intention to Use AVs for Commute Trips (USA) ................................ VII 
Appendix C .......................................................................................................................... XV 
8 Experimental Design for the Intention to Use AVs for Commute Trips ...................... XV 
Appendix D ....................................................................................................................... XVII 
9 Results of Topic Model (USA) .................................................................................. XVII 
Appendix E ......................................................................................................................... XXI 
10 Estimation Results for intention to Use AVs for Commute Trips ............................. XXI 
Appendix F....................................................................................................................... XXIX 
11 Results for the proposed framework ....................................................................... XXIX 
Appendix G ................................................................................................................... XXXIII 
12 Python code for Topic Models ............................................................................. XXXIII 
Appendix H ...................................................................................................................... XLIII 
13 Python Code for the Framework to Measure Attitudes .......................................... XLIII 
Appendix I ............................................................................................................................. LI 
14 Python Code for the Mapping of Responses................................................................. LI 

P a g e | XIII

P a g e  | XIV 
LIST OF FIGURES 
Figure 2.1 The Framework for the Theory of Planned Behaviour Source- [27] ..................... 26 
Figure 2.2 The Framework for the Technology Acceptance Model [28] ................................ 27 
Figure 2.3 The Framework for the  Unified  Theory of Acceptance and  Use  of Technology 
(UTAUT) [29] .......................................................................................................................... 28 
Figure 2.4 Probabilistic Graphical Model for sLDA ............................................................... 34 
Figure 3.1 The Experimental Design for the Intention to Use Shared AVs ............................ 39 
Figure 3.2 Modified Framework of the Theory of Planned Behaviour ................................... 39 
Figure 3.3 Frequency Distribution for Attitudes towards Use of AVs .................................... 42 
Figure 3.4 Frequency Distribution for Subjective Norms ....................................................... 43 
Figure 3.5 Frequency Distribution for Perceived Behavioural Control Variables .................. 44 
Figure 3.6 Frequency Distribution for the Intention to Use Shared AVs ................................ 45 
Figure 3.7 Words Clouds for OE1, OE2, OE3, OE4 ............................................................... 49 
Figure 3.8 Inter-topic Distance for a. OE1, b. OE2, c. OE3, d. OE4 ....................................... 52 
Figure 4.1 Experimental Design for the Mode Choice for Commute Trips ............................ 61 
Figure 4.2 Proposed Framework for the Mode Choice for Commute Trips ............................ 62 
Figure 4.3 Frequency Distribution for Perceived Ease of Use ................................................ 67 
Figure 4.4 Frequency Distribution for Perceived Usefulness .................................................. 68 
Figure 4.5 Frequency Distribution for Perceived Safety Risk of AVs .................................... 69 
Figure 4.6 Frequency Distribution for Perceived Privacy Risk of AVs .................................. 69 
Figure 4.7 Frequency Distribution for Trust in AVs ............................................................... 70 
Figure 4.8 Frequency Distribution for Attitudes towards AVs................................................ 71 
Figure 4.9 Frequency Distribution for the Mode for Commute Trips ..................................... 71 

P a g e | XV

Figure 4.10 Inter-topic Distance for LDA (clockwise from top left) OE1, OE2, OE3, OE4,

OE5, OE6 ................................................................................................................................. 78

Figure 4.11 Probabilistic Graphical Model for Individual Model ........................................... 82

Figure 4.12 Probabilistic Graphical Model for the Proposed Model ....................................... 83

Figure 4.13 Probabilistic Graphical Model for the Modified Framework ............................... 87

Figure 4.14 Predictions Using the Proposed Framework ........................................................ 88

P a g e  | XVI 
LIST OF TABLES 
Table 3.1 Socio-economic and Travel Characteristics ............................................................ 41 
Table 3.2 Results of Factor Analysis ....................................................................................... 47 
Table 3.3 Average Number of Words per Response ............................................................... 48 
Table 3.4 Top 5 Words for Each Topic for Open-ended Questions ........................................ 50 
Table 3.5 Estimation Results of the Model for the “Intention to Use Shared AVs” ............... 55 
Table 3.6 Comparison of the Goodness-of-fit Measures for the Test Sets .............................. 56 
Table 4.1 SP Experimental Design .......................................................................................... 61 
Table 4.2 Socio-Demographic Characteristics ........................................................................ 63 
Table 4.3 Travel Characteristics of the Individuals (I) ............................................................ 65 
Table 4.4 Travel Characteristics of an Individual (II) ............................................................. 66 
Table 4.5 Results  of the  Statistical Analysis  on Whether  Open-ended Questions  Influence 
Responses to Likert Scale Questions ....................................................................................... 72 
Table 4.6 Internal Reliability- Cronbach’s Alpha.................................................................... 73 
Table 4.7 Average Number of Words per Response ............................................................... 75 
Table 4.8 Top 5 Words for Each Topic for Open-ended Questions ........................................ 76 
Table 4.9 Mapping between Closed- and Open-ended Responses .......................................... 80 
Table 4.10 Goodness-of-fit Measures for Training and Test Set ............................................. 85 
Table 8.1 Statements to Measure the Constructs in the Proposed Model and Their Sources XV 
Table 8.2 Orthogonal Scenarios (Source: Haboucha, Ishaq and Shiftan[40]) ...................... XVI 
Table 9.1 Top 5 Words for Each Topic for Open-ended Questions .................................. XVIII 
Table 10.1 Estimated Coefficients for Socio-Demographic Characteristics ........................ XXI 
Table 10.2 Estimated Coefficients for Travel, Familiarity with AVs and SP ................... XXIII 
Table 10.3 Estimated Coefficients for Likert Scale Responses ......................................... XXIV 

P a g e | XVII

Table 10.4 Estimated Coefficients for the Topics ............................................................. XXVI

Table 11.1 Mapping of the Likert Scale Responses for Ver_LK and Ver_LKOE ............. XXX

Table 11.2 Mapping of the Likert Scale Responses for the Extracted Topics from Open-ended

Responses ........................................................................................................................... XXXI

P a g e | 1

1 INTRODUCTION

1.1 INTRODUCTION AND BACKGROUND

Questionnaires are used extensively by researchers and analysts to collect information or

opinions of people. These questionnaires distributed to a sample in the population are used

extensively in travel behaviour research to evaluate travellers’ behaviour and devise suitable

strategies for policy implementations. However, many of these strategies are subjective;

therefore, researchers started measuring attitudes [1], [2]. In the past, they have been used to

measure the intention to use certain services before the implementation [3]–[5], attitudes [6]

and consumer/traveller preferences [3], [4], [6]–[9]; to name a few.

Attitudes are defined as dispositions towards overt action or as verbal substitutes for overt

action [10]. Both researchers and practitioners measure attitudes in a multitude of different

fields, transportation engineering, medical sciences and psychology [3], [6], [10]–[16]. Since

its initial use, researchers have considerable efforts to develop procedures (closed- or open-

ended questions) to measure attitudes accurately. For example, in the context of AVs, we could

use a closed-ended question such as, “Your opinion on AVs on a rating scale of 1-5 is …” or

an open-ended question “Tell us your opinion about AVs”.

There is an intense debate on the most appropriate method to measure attitudes, closed- or

open-ended. Lazarsfeld [17] brought a temporary truce to this fierce debate, who concluded

that there are merits and demerits associated with each of the two approaches, and the

administrator should use a method that he/she believe best suited for measuring the attitudes

under consideration. Later, Converse [18] summarised the discussions and the evolutions of

the two approaches. Most studies used the closed-ended approach to measure attitudes, while

a relatively smaller number of studies used the open-ended approach, which is primarily due

to the swift operation of closed-ended surveys and higher completion time and execution time

for the open-ended surveys [18], [19]. In the following paragraphs, we briefly discuss the two

approaches and a detailed discussion in Chapter 2.

To measure an individuals’ attitudes using the closed-ended approach, we present statements

describing the attitude and a suitable scale for the respondents to express their attitudes. For

example, one could use bipolar scales. Scales reduce the burden for the analyst [20], which,

along with other benefits (discussed later), has led to the widespread use of scales to measure

attitudes. However, researchers often criticise the closed-ended approach for increasing the

P a g e | 2

burden of the respondent (identify their attitudes and relate them to a scale) and extracting

responses that might be of interest to the researcher and not to the respondent [20]–[23].

Open-ended responses encourage respondents to formulate and articulate their opinion about

the attitude of interest to the researcher. They are recommended when the researcher intends to

measure the respondents’ attitude towards a relatively new and complicated problem. They can

also measure attitudes towards a problem that may otherwise garner very little attention or

issues about which the people might not have thought extensively [13]. However, compared to

the closed-ended responses, it increases the burden on the respondents (respondents should

write/type responses) and the enumerators [17]. Furthermore, it demands dimensionality

reduction and the development of a coding scheme to reduce bias across analysts before the

analysis, which increases the burden further [24], which increases the time required for the

analysis of the open-ended responses significantly [18]. For these reasons, the large-scale use

of this approach for measuring attitudes has been limited.

Based on the inferences from the literature review presented, it can be concluded that the

difficulty in analysing the open-ended response is a significant factor that discouraged the

widespread use of the technique. However, recent advances in machine learning, specifically

in Natural Language Processing, such as Topic Modelling, now offer the possibility of

extracting the information from the open-ended responses. This research proposes to use Topic

Modelling to extract information and quantify them into numbers to measure attitudes.

1.2 OBJECTIVES AND SCOPE

This thesis pursues the following objectives: -

1. To analyse if the method of collecting qualitative data influences the survey responses

2. To develop an approach to extract open-ended responses from a survey and process the

data

3. To compare the relative performance of the open- and closed-ended responses in

analysing qualitative data

4. To develop a framework that measures attitudes while allowing respondents to choose

their preferred type of question (closed- or open-ended)

This thesis analyses the suitability of using Topic Modelling to extract information from the

open-ended responses to measure attitudes, and we use questionnaires that collect information

on the attitudes related to Autonomous Vehicle (AV) as a case study. The literature indicates

P a g e | 3

that there could be other benefits associated with the use of open-ended questions, such as

spontaneous and unbiased responses [18], [25]. In this research, we restrict the analysis to

evaluating the suitability of open-ended questions to measure attitudes, together with

methodologies to do so. In this regard, alternative versions of the questionnaires that consider

open- and/or closed-ended questions are presented randomly to respondents. To quantify the

relative benefits, we evaluate the relative performance of the alternative versions of the

questionnaire to measure attitudes. In this regard, statistical models estimated using each of

these independent datasets are evaluated based on their predictive capability. Besides, we

evaluated the responses to the attitudinal questions to analyse if the method of asking questions

influences the measured attitudes. Using the estimated coefficients, we developed a framework

to measures attitudes by allowing respondents to choose their preferred type of question.

1.3 OVERVIEW OF THE APPROACH

As mentioned in the previous section, one of the current research objectives is to evaluate if

using open-ended questions to measure attitudes and gain more insights into this problem; we

implemented two sequential surveys/studies. Our first study aimed to analyse the intention to

use Shared AVs using responses from India. We designed the questionnaire using the Theory

of Planned Behaviour (TPB). We used two versions of the questionnaire, a. only Likert scale

questions in the first version (Ver_LK), b. in the second version (Ver_LKOE), we used open-

ended questions, followed by the same Likert scale questions in Ver_LK. However, in this

study, the sample size and representativeness of the sample was a concern and to address these

objectives, we carried out a second study using a more extensive and representative sample

from the USA. We designed the questionnaire using an extended version of the Technology

Acceptance Model (TAM). Differently from the previous study, we used three versions of the

questionnaire, where a third version (Ver_OE) used only open-ended questions, along with the

two versions used in the study carried out in India.

To pursue the first objective, we compare the responses to the Likert scale responses common

to both versions of the questionnaire. We compared the responses to the twenty-one Likert

scale questions (in 1st study) and the twenty Likert scale questions in Ver_LK and Ver_LKOE

(in the 2nd study). We compare the frequency distributions, both qualitatively and using non-

parametric tests (to test if distributions are similar for both versions of the questionnaire).

Furthermore, ordered models are estimated for each of the Likert scale questions to identify

the influence of the questionnaire type.

P a g e | 4

To extract information from open-ended responses (second objective), we used Topic

Modelling approaches such as Latent Dirichlet Allocation (LDA) and supervised Latent

Dirichlet Allocation (sLDA). Information, in the form of topics (different themes discussed in

open-ended responses), was extracted from each of the open-ended response and the extracted

topics were checked to identify if they were distinct and meaningful. Besides, topics were

compared with the Likert scale questions to identify the coherence.

The third objective involved analysing the relative performance of the closed- and open-ended

responses. In the first case study, we estimated models separately for the two versions of the

questionnaire and compare their relative performance. Ver_LK used Likert scale responses,

and Ver_LKOE used Likert scale responses and the information extracted from the open-ended

responses using Topic Models to predict the intention to use Shared AVs. In the second study,

which analysed the intention to use AVs for commute trips, we compared the performance of

the models estimated for each of the three datasets. For Ver_LK and Ver_LKOE, we estimated

models using the responses to the Likert scale questions, and for Ver_OE, we estimated models

using the topics extracted from the open-ended responses. The models were estimated using a

more sophisticated behavioural framework using the Probabilistic Graphical Models.

We used the estimated coefficients of the models from Ver_LK, Ver_LKOE and Ver_OE for

the final objective. Using the coefficients and the attitudes for Ver_LK, we try to generate the

corresponding Likert scale responses for Ver_LKOE and the topic proportions for Ver_OE.

We achieved this using Gibbs Sampling and repeat this for Ver_LKOE and Ver_OE. This

framework allows respondents to use either of the two approaches (Likert scale or open-ended)

to answer the survey. The analysts could then use the data to estimate models using responses

from either of the questionnaires.

1.4 THESIS CONTRIBUTION AND CORRESPONDING STAKEHOLDERS

As mentioned previously, there are differences in opinions within the research community and

the practitioners on an appropriate method for measuring attitudes. The use of the open-ended

approach is considered more behaviourally correct and captures the actual attitudes of the

respondents as they do not capture respondents to statements that might be of importance to

the analyst. However, the difficulties posed with open-ended questions to respondents and

analysts have deterred the extensive use of this approach. It requires respondents to articulate

an appropriate response and could thus be time-consuming. It poses even more serious

P a g e | 5

difficulties for the analysts, as analysts must spend considerable time extracting and coding the

responses to make inferences.

By relying on the advances in Machine Learning, we simplify the extraction of open-ended

responses, as we demonstrate in this thesis. Furthermore, we also proposed a framework to

measure attitudes by allowing respondents to use a questionnaire type of their choice. Also, the

thesis explored the implications of the questionnaire type (open- or closed-ended) on the

responses to the attitudinal questions. Identifying the potential impact would facilitate analysts

to either use open-ended questions alone or in conjunction with the closed-ended questions to

measure attitudes more effectively.

To our knowledge, this is the first instance of the use of Topic Modelling approaches to extract

information from open-ended responses in travel behaviour research. Moreover, this thesis also

investigated if open-ended questions before a set of Likert scale questions influence responses

to these Likert scale questions. And the results indicate a significant reduction in respondents

choosing neutral responses and a corresponding increase in respondents choosing extreme

points. In addition to this, we proposed a framework that allows respondents to choose the

questionnaire type of their choice yet allow respondents to use the models of their choice to

predict behaviour.

We describe below the benefits to the different stakeholders from this thesis: -

• Analysts - Firstly, they are relieved from the difficulties of having to code the open-

ended responses and the bias from coding. Secondly, this saves considerable time for

the extraction of the information from the open-ended responses. Furthermore, should

analysts be interested in not using Topic Models to extract information from open-

ended responses, they could still improve closed-ended questions by reducing neutral

responses by placing open-ended questions before closed-ended questions. Finally, if

the analyst intends to provide flexibility to respondents to choose the questionnaire type

of their choice, they could use our proposed framework to obtain coefficients for

prediction using the models of analysts’ choice.

• Consultancies - consultancies can reduce the human resources required to extract and

process open-ended responses and reallocate them to other tasks. Furthermore, this

saves the overall cost of the project.

• Policymakers - Since the analysts are not coding it manually, the extracted information

is more systematic and less subjected to subjectivity, making the inferences drawn from

P a g e | 6

the studies more realistic. Policymakers also benefit from a significant reduction in the

time to gain insights.

• Respondents – The proposed framework allows respondents to answer surveys using

the questionnaire of their choice.

1.5 SCIENTIFIC OUTCOMES FROM THIS THESIS

Journal Publications

1. Open-Ended Versus Closed-Ended Responses: A Comparison Study Using Topic

Modelling and Factor Analysis, published in the IEEE Transactions on Intelligent

Transportation Systems (2021).

2. A Closer Look into How Land-Use, Social Networks, and ICT Influence Location

Choice of Social Activities, published in the AESOP Transactions (2019)

Conference Presentations

1. Opening Up the Conversation: Topic Modelling for Automated Text Analysis in

Travel Surveys, presented at the 21st International Conference on Intelligent

Transportation Systems (ITSC), 4th – 7th Nov 2018, Hawaii, USA.

2. Comparing Likert Scale and Open-ended Questions in the Application of TPB to

the Intention-to-Use of Autonomous Shuttle, presented at the 15th International

Conference on Travel Behaviour Research (IATBR), 15th – 20th Jul 2018, Santa Barbara,

California, USA.

3. Do Open-ended Questions Influence the Measurement of Attitudes? An

Investigation, accepted for presentation at the 12th International Conference on

Transport Survey Methods, 20th – 25th March 2022, Lisboa, Portugal

4. A Closer Look into How Land-use, Social Networks, and ICT Influence Location

Choice of Social Activities, presented at the 2017 AESOP Annual Congress, 11th to

14th Jul 2017, Lisboa, Portugal.

5. An Investigation into the Influence of Land-use, Social Networks, and ICT on

Location Choice of Social Activities, presented at the redeMOV 2nd Annual

Conference – Urban Mobility Innovation for a Changing Society, 3rd to 9th May 2017,

Lisboa, Portugal.

P a g e | 7

1.6 ORGANISATION OF THE THESIS

The remaining chapters of this thesis are structured as follows: -

• Chapter 2- a review of the literature that discusses the use of closed- and open-ended

questions to measure attitudes. In addition to this, we discuss the Frameworks to

Measure Attitudes, Natural Language Processing and its application and the modelling

techniques. Related to the design of the questionnaire, we review the literature related

to Autonomous Vehicles.

• Chapter 3- presents the methodology and estimation results from the case study

carried out in India, which modelled the intention to use Shared AVs. Furthermore, we

outline the limitations of the current study.

• Chapter 4– to address the Indian studys’ limitations, we launched a second study in

the USA. This Chapter presents the methodology and estimation results from the case

study carried out in the USA, which modelled the intention to use AVs for Commute

Mode. In addition to this, we also present a framework that allows respondents to

choose their preferred question type to answer questions related to measured attitudes.

• Chapter 5- presents the salient findings of this thesis, limitations of the study and

directions for future research.

P a g e | 8

P a g e | 9

2 FUNDAMENTALS QUALITATIVE DATA AND ITS

MEASUREMENT

2.1 INTRODUCTION

This Chapter discusses the advances in the research related to measuring qualitative data,

precisely measuring attitudes. Neuman [26] emphasises the importance of three factors for

measuring qualitative data: a construct, a measure, and the ability to identify what the

researcher intends to measure, and this research focuses on the improvements related to the

second factor, “measure”.

The first section to follow discusses the need to measure attitudes in travel behaviour research,

where we present literature discussing the evolution of travel behaviour research. The recent

developments emphasise the importance of attitudes in travel behaviour research and the need

for its measurement, followed by a short discussion of the research that analysed attitudes in

travel behaviour. The literature review, although not exhaustive, highlights the importance of

measuring attitudes.

Section 2.3.1 presents a discussion on the methods used to measure qualitative data, precisely

the closed-ended responses. We discuss the history of the closed-ended approach and its use in

travel behaviour. Closed-ended scales can be of varying length - from dichotomous to very

large scales. The following sub-section presents the rationale for the scales, benefits and use in

measuring attitudes, which gives a better perspective of the closed-ended approach. Central to

the discussion of the length of the scales is whether the researcher/analyst intends to capture

neutral responses. Analysts interested in measuring neutral responses use scales with an odd

number of points, while others might rely on scales with an even number of points, and this is

indeed a source of intense debate among analysts and policymakers. Including neutral points

in the scales captures the response of individuals who may be neutral without having to

compromise. We present various approaches to extract information from closed-ended

responses. The final discussion on closed-ended responses is related to the common issues

related to using the closed-ended approach.

Section 2.3.2 discusses the use of open-ended responses for measuring attitudes. The objective

is to provide the various aspects of interest pertinent to the open-ended approach in a nutshell.

We begin this with a discussion on the potential benefits and the scenarios that warrant the use

of open-ended responses. The open-ended approach requires the coding of responses before

P a g e | 10

the analysis. Researchers argue that the position of the open-ended questions in the

questionnaire might influence the measured attitudes. Also, respondents may skip open-ended

questions without answering them, which is a significant concern. Consequently, the number

of missing responses is generally high for open-ended questions, also discussed in this Chapter.

Later, we briefly discuss the different approaches to extract information from open-ended

responses.

After making decisions on the appropriate approach to measure attitudes, researchers/analysts

should identify a framework that could explain the psychological processes involved in

forming these attitudes. In the subsequent section (Section 2.4), we present an overview of the

different socio-psychological theories used to measure attitudes in travel behaviour. Among

these, researchers have widely used the Theory of Planned Behaviour proposed by Ajzen [27]

and its extended versions. Other commonly used theories include the Technology Acceptance

Model (TAM) by Davis, Jr [28] and its extensions. Finally, researchers have used the Unified

Theory of Acceptance and Use of Technology (UTAUT) proposed by Venkatesh et al. [29]

and its enhanced versions to model the willingness/intention to use AVs. This section covers

the underlying principles, their applications and the salient findings related to their use.

After collecting the data, the next challenge involves estimating models to analyse travel

behaviour. Researchers have used models such as multiple linear regression, discrete choice

models and structural equation models. The next aspect of this chapters’ discussion (Section

2.5) is on the different statistical approaches adopted for modelling the intention/willingness to

use AVs. Furthermore, in Section 2.6, we discuss the factors influencing the intention to use

AVs. The final aspect of the discussion is related to the extraction of information from the

open-ended responses. In this thesis, we use Topic Modelling to extract this information.

Section 2.7 describes Topic Modelling and its application in research, particularly in travel

behaviour research.

2.2 MEASURING ATTITUDES IN TRANSPORTATION

Here, we provide a brief overview of the evolution of travel behaviour research that eventually

led to the measurement of attitudes, defined as dispositions towards overt action or as verbal

substitutes for overt action [10]. Initial attempts of transport modelling focussed on predicting

vehicular trips to make decisions regarding the necessary infrastructure. With the emphasis

shifting towards having more efficient systems, researchers started analysing the potential of

other modes of transport, which led to the inclusion of mode choice in travel demand models

P a g e | 11

and eventually to the development of a more behavioural framework viz., the Activity-based

approach. In parallel, the increased importance of sustainability encouraged

researchers/planners to assess the possibility of non-motorised modes of transport such as

walking and biking. Shifting travellers from one mode to another, particularly the sustainable

modes, often involves physical, regulatory, and pricing measures and involves analysing

subjective attitudes, which are qualitative. For example, the analysis of public transportation

often involves investigating attitudes related to comfort, convenience, etc. [1], [2], which paved

the way for research measuring attitudes in travel behaviour in mode choice, intention to use

sustainable modes of transport, AV and accident analysis, to name a few. These highlights the

importance of attitudes in travel behaviour and the need to improve its measurement.

We elaborate this further by referring to some of the research works in each of the fields

mentioned above. The constructs related to the Theory of Planned Behaviour, desires, past

behaviour, habitual use of the car and social stigma influence the shift towards public

transportation [30]–[33]. Research related to biking indicates that it is essential to understand

the attitudes towards biking to create a conducive environment for biking. Habits, familiarity

with the systems, pro-bicycle attitudes, the perception of ease-of-use and convenience

influence the intention to use bikes [3], [34]–[36]. And pro-bicycle attitudes of an individual

are often predicated on the individuals’ perception towards the environment, health and the use

of bikes [35], [37]. Another domain of interest to the researchers is understanding the attitudes

towards AVs, as it is quintessential to improve perceptions and, thereby, its potential use.

Attitudes, contextual acceptability and the enthusiasm for driving are influencers of the

intention to use AVs [38]. Individuals who are optimistic about driving, familiar with any form

of automation and conscious about the environment are more likely to use it [39]–[41].

On the other hand, the resistance to its use, anxiety and concerns towards the use of AVs impact

them negatively [39]–[43]. Furthermore, in accident analysis, personality traits linked strongly

to attitudes play a vital role. Analysing driver behaviours that cause accidents by understanding

the attitudes towards safety and risky driving behaviour could improve the prediction of traffic

accidents [44]–[46].

To summarise, the short review of the literature presented above highlights the importance of

attitudes in travel behaviour research. Making this information available to decision-makers

equips them with information allowing them to make better policy-related decisions.

Furthermore, understanding the underlying factors influencing the intention to use future

P a g e | 12

technology, policymakers, manufacturers, and decision-makers could address the concerns on

the proposed technique. However, the qualitative nature of the attitudes presents challenges in

its measurement. Researchers and analysts use closed- and open-ended questions to gain

insights into these attitudes, and we present a discussion on each of the two approaches in the

following section.

2.3 MEASUREMENT OF QUALITATIVE DATA

Having realised the importance of measuring attitudes in travel behaviour research, we now

focus on how it can be measured. Should researchers and policymakers use closed- or open-

ended questions? Closed-ended questions present statements such as “I am comfortable sharing

a ride with strangers” or “My friends and family will be supportive of me using a bicycle” to

respondents and individuals then choose points on a scale that best matches their attitudes. The

alternative approach, viz. open-ended approach, requires respondents to articulate their

response to a question in their own words. Interestingly, Renesis Likert, the proponent of the

five-point Likert scale, eventually recommended using open-ended questions to measure

attitudes [18]. This debate on which is a better approach for measuring attitudes intensified

with time, and the preference shifted in favour of using the closed-ended approach, and the

advent of internet-based surveys has only intensified this debate further. The evolution of the

two approaches and their applications, particularly in travel behaviour research, is described in

detail in this section.

2.3.1 Closed-ended Responses

As discussed earlier, researchers using the closed-ended approach present statements

describing the attitude and then respondents choose points on the scale that best describes their

attitude. Scales can be unipolar or bipolar. Bipolar scales are based on the notion that attitudes

are bipolar constructs ranging between the two extremes (negative to positive) with a neutral

midpoint, whereas unipolar scales measure the importance of an attitude to an individual, often

without a precise midpoint [47]. To measure attitudes, presenting some statements positively

and some others negatively are desirable. Moreover, it is advisable to use a balanced scale with

an equal number of points corresponding to the two extremes. Closed-ended questions are

simple, easy to administer and analyse [48].

Plant (1922) [11] proposed one of the first rating scales to measure attitudes, which simplified

the data collection, reduced the completion time and made analysis easy. Considering the

P a g e | 13

different aspects, in 1924, Symonds [50] recommended using the seven-point scale to measure

attitudes. Later, Likert [10] proposed the five-point scale ranging between the extremes of

attitudes in 1932, which he argued is less painstaking and eliminated the need for judges while

ensuring reliable results. It is worth noting that seven- and five-point scales were not the only

scales in use. Over time, numerous other researchers proposed scales of varying lengths.

Interestingly, to date, the optimal number of points on the scale remains a matter of intense

debate and one that requires careful consideration.

Scales should be designed carefully by the analyst/researcher, and before using a scale, they

should make decisions regarding the length, the inclusion of midpoint, labelling (verbal or

numeric), etc. The choice of words in the verbal labels or the numbers in the numeric labels

should be carefully decided [47]. Failing to provide an exhaustive list of the categories for a

closed-ended question might constrain respondents who might then give undue importance to

categories that might otherwise be insignificant [49]. It is essential to ensure that all aspects of

the attitude and appropriate coverage of all the levels increase the accuracy of predictions [50].

The widespread use of rating scales can be attributed to their ease of use by both respondents

and analysts since answers can be directly quantified [18], [51]–[53]. Also, the use of scales

facilitates rapid operation, as the time between implementation, possession of data, analysis

and the evaluation of outputs is minimal [18]. In the following sub-section, we present literature

on scales of varying lengths and the optimal number of points on a rating scale.

2.3.1.1 Types of Closed-ended Scales

The classification of scales based on the number of points on the rating scale often includes

simple dichotomous scales to the most complicated scales. Symonds [54] recommended the

seven-point scale, whereas Likert [10] recommended the five-point scale. Compared to the

dichotomous scales, longer scales allow respondents to express their attitudes more effectively

by capturing the direction of agreement/disagreement and the intensity to which the individual

agrees/disagrees [47], [55], [56]. However, including too many points on the scale might reduce

the clarity, consistency and discriminatory power while increasing the respondents’ burden and

errors [47], [57], [58]. Smaller scales inhibit the respondents’ expressive power, be restrictive,

and diminish the respondents’ validity and discriminatory power [57], [59]. Researchers should

decide the length of scales after accounting for the number of cognitive categories of interest

to the researcher [47]. Preston and Colman [59] recommended five-, seven- and ten-points

scales based on the easiness of use; however, researchers do not yet have a consensus on the

P a g e | 14

optimal number of points on a scale. Consequently, researchers use scales of varying sizes,

discussed in the following paragraphs.

Two-Point Scales- Two-point scales are simple and effective to capture the direction of the

decision and convenient to administer and score [24], [57]. They are the preferred choice for

surveys with questions when the alternatives are homogeneous [13], [57]. It is also preferable

to use them to measure attitudes from the general public or a culturally diverse sample with no

formal education [57], [60]. Two-point scales expects precise responses from the participants,

might hence be restrictive and do not capture the extremity of agreement/disagreement [24],

[48], [57]. Researchers have used two-point scales to measure the social desirability score,

willingness-to-pay for bus fare reform and the willingness-to-use automated cars [43], [61]–

[64]. In the field of transportation, this has found application in analysing if the individual liked

using AVs, their interest in AVs and their approval or disapproval of AVs [65], [66].

Three-Point Scales- Three-point scales extend the dichotomous scale by allowing truly neutral

individuals to express their opinion without being forced to agree or disagree with the

statement. Researchers used the three-point scales to measure the involvements in accidents

and the attitudes towards risky driving behaviours [45], [67]. They have been used to

understand the concerns and benefits of AVs, assess the ability to multi-task while riding AVs

and the need for connectivity features in AVs [16], [66], [68].

Four-Point Scales- The four-point scales offer more flexibility, as it allows individuals to

express the degree to which they agree/disagree. However, they do not allow individuals to

voice their neutral opinions and are preferred when there is no interest in measuring neutral

opinion. Applications of four-point scales include measurement of the attitude towards public

transport, user satisfaction with the bus service and the propensity of driving a car [41], [69],

[70]. They are also used to measure the perceived benefits of AVs, assess the concerns,

intention to use AVs and the expectations and the anxieties towards AV [65], [71]–[75].

Five-Point Scales– Proposed by Likert [10], the five-point scale- an extension of the four-point

scale with the inclusion of the neutral option, is widely used by researchers and analysts. Some

of the applications in the field of transportation involve assessing driving behaviour, driving

safety, driver skill inventory and risk-taking attitude [45], [76]–[79]. Researchers have also

used the five-point scales to measure perceptions and service quality, attitudes towards safe

driving, driver behaviour and the attitudes towards cars and public transit [80]–[83]. Other

applications include assessing the intention to use various services [public transportation over

P a g e | 15

private cars, bicycles, bike-sharing systems and the dependence on the car] and the frequency

of use of different modes [30], [32], [84]. In the past, researchers have used to measure attitudes

towards AVs, the environment, public transport and safety, technology [4], [40], [85].

Six-Point Scales- The six-point scales provide respondents with more flexibility to express

their attitude; they, however, cannot capture neutral opinions. Compared to the four-point scale,

six-point scales offer two additional levels for respondents to express their attitudes.

Researchers often argue that the six-point scale provides higher discrimination and reliability

[58], [86]. For example, Komorita [87] used a six-point scale to compute the neutral region on

a Likert scale without a neutral point. Other instances of the use of a six-point scale were to

analyse driver behaviour, the intention to use public transport and bikes and the frequency of

use of different modes [32], [35], [77], [83]. Other applications include the analysis of the

attitudes towards AVs, anxiety, benefits and concerns, mode choice frequency [41]–[43], [88].

Seven-Point Scales- The seven-point scales- a popular and widely used scale, is the extension

of the six-point scales with the inclusion of the neutral option, and many researchers consider

the optimal number of points on a scale as seven [54], [89]–[94]. Claims from researchers in

favour of the seven-point scales include better reliability, equal utilisation of all categories,

higher accuracy and easiness of use [54], [90], [93]. Miller [89] argued that using a seven-point

(± 2) scale enables respondents to process and make better judgements. Furthermore, having

more than seven points increases the cognitive burden of the respondents with almost no

improvement in reliability [94]. Researchers have used the seven-point scales to measure the

social desirability score, risky behaviour, perceived risk, transport priorities, attitudes towards

transport mode and the intention to use public transport/bicycle/AVs [31], [32], [34], [37], [38],

[61], [64], [95]. It has also been used to measure the attitudes towards AVs, constructs related

to TPB and TAM [43], [75], [96].

Longer Scales- Longer scales are predicated on the idea that they capture richer information

than scales with fewer points. The nine-point scale was recommended by Bendig [97],

considering the additional information it captures. Researchers used the ten-point scale to

measure driver stress inventory, attitudes towards behaviour and customer satisfaction with

public transport services [96], [98], [99] and an eleven-point scale was used to measure driver

stress, willingness to pay for autonomous vehicles [39], [76]. Preston and Colman [59]

analysed the reliability and validity of a 101-point scale to measure the service quality in

restaurants and stores.

P a g e | 16

2.3.1.2 Optimal Number of Points in a Scale

“Too many cooks spoil the broth” would be an appropriate phrase to describe the current

scenario, as there is still no consensus among researchers on what is the appropriate scale

(based on length) to be used. Champney and Marshall [56] advocated using longer scales (18-

24 points) unless established that five- and seven-point scales are appropriate for measuring

the attitude under consideration. They concluded that the number of points on a scale is a

function of measurement conditions and should be decided, taking reliability and validity into

consideration.

Reliability, an indicator of the effectiveness of the scale, can be assessed by evaluating the

consistency of the results. Longitudinal reliability is the consistency in response when a

question is repeated to the same individual over multiple instances. A low value for longitudinal

reliability could indicate the change in an individuals’ attitude or the unreliability of the

measure. On the other hand, cross-sectional reliability is the consistency in response to a series

of attitudinal questions asked to the same individual [47].

According to Bending [97], [100], reliability was not affected by an increase in the number of

scale points (three-, five-, seven- and nine-points) but declined as the number of points

increased to eleven [97]. Respondents’ reliability was constant for larger scales (five-, seven-

and nine-point), low for two-point and high for three-point scales [100]. Peabody [55]

concluded that long scales primarily reflect the direction of the response but are limited in

capturing the degree of extremeness and reliability. Furthermore, the dichotomous scales are

equally reliable compared to longer scales when the attitudes are homogeneous; however, for

heterogeneous items, the reliability increases with the length of the scale [57].

Another factor that positively influences reliability is verbal anchoring, particularly that of the

central category [97]. The conveyed information increases with the number of points on a scale

(three-, five-, seven-, nine- and eleven-point) and verbal anchoring [101]. Lissitz and Green

[102] illustrated this relationship using the Monte Carlo approach. However, for scales with

more than five points, the increase in reliability was nominal, thereby challenging the claim

that seven is the optimal number for a scale [102]. However, Preston and Colman [60]

concluded that the reliabilities were maximum for the seven- and ten-point scales. Komorita

and Graham [57] emphasised the need to include validity and reliability to choose the number

of points on a scale.

P a g e | 17

Validity is the accuracy with which the measure taps a construct of interest. Correlational

validity is the degree to which a given measure can predict other variables related to this given

variable, and discriminant validity measures the distinction between constructs expected to be

distinct from each other [47]. From their study, Preston and Colman [59] observed a positive

correlation between the number of points on the scale and validity but remained unchanged for

scales of length more than five.

However, Matell and Jacoby [103] concluded that the reliability and validity are independent

of the number of points on the scale. Also, reliability and validity are not the only factors that

decide the optimal number of points on a scale [103]. Matell and Jacoby [104] concluded that

the length of the scale did not influence the utilisation of the scale points. Should the

researcher/analysts intend to reduce response time, fatigue, warm-up, boredom, etc., shorter

scales should be used. Longer or even-numbered scales might help address the issues

associated with neutral responses [104]. The length of the scale should be decided after taking

reliability, validity, discriminating power and the respondent preferences into consideration

[59]. So, there is still no consensus on the appropriate length of a scale, and it is often the

researchers’ prerogative.

2.3.1.3 Neutrality and its implications

Respondents sometimes choose the middle option as they do not want to take a position [48].

Researchers and analysts circumvent this by using scales without a neutral option [48], [87],

[105]. Komorita [87] demonstrated the potential to compute a neutral point/region for a Likert

scale with an even number of scale points and aids the identification of genuinely neutral

responses. Sometimes, respondents choose points on the scale that may please the interviewer

or are socially acceptable, called acquiescence. The scales without the neutral point reduce the

social desirability bias and acquiescence [105]. However, critics still argue that using even

points on the scale fails to capture the responses of truly neutral individuals. They are instead

forced to agree/disagree with the statement [48], [104]. This issues still fails to reach a

consensus and is mostly the choice of the researcher/analyst.

2.3.1.4 Satisficing

Individuals tend to provide typecast responses, choosing one of the extremes, positive or

negative. Individuals are more likely to endorse statements (satisficing) than disagree, and the

lack of suitable alternatives in the rating scales may force individuals to choose inappropriate

P a g e | 18

levels. If the researcher is not careful enough to present a comprehensive questionnaire

covering all the different aspects related to the attitude to the respondent, this may cause gaps

in the attitudes captured [48].

2.3.1.5 Other Issues

In this sub-section, we discuss other concerns related to using closed-ended responses reported

in the literature. The closed-ended approach uses statements about attitudes observed to be

important in a study conducted on another sample or a different context. Therefore, closed-

ended responses may measure aspects of the problem relevant to the analyst or the researcher

and not the issues that may be of actual relevance to the individual [25], [50], [106].

Consequently, individuals may respond to statements without perceiving them correctly [107].

Another issue related to using the closed-ended approach, particularly in online surveys, is that

the order in which researchers present the options to the closed-ended responses may influence

responses. For instance, by evaluating the movements of the eye, Galesic et al. [108] concluded

that respondents spend more time on the options at the top and are more likely to choose them.

However, Mavletova [110] observed this primacy effect as higher for PCs, though Buskirk and

Andrus [110] found similar patterns for respondents answering both smartphones and PCs.

2.3.1.6 Analysis of Closed-ended Responses

When researchers aim at measuring the overarching relationships between measured variables

(closed-ended responses), approaches such as Principal Component Analysis (PCA) and

Exploratory Factor Analysis are used. Researchers use the Principal Component Analysis

(PCA) for the exploratory data analysis and dimensionality reduction of the closed-ended

response. Data is transformed into a new coordinate system with the scalar projection of the

data with the most significant variance in the first coordinate, the following most significant

variance in the second coordinate, etc. PCA has found profound application in research;

intention to use bike-sharing for holiday cycling [3], perceptions of autonomous vehicles [95],

analysing the importance of environmental considerations [109]- to name a few.

To perform exploratory analysis on the closed-ended responses in which they aim to uncover

the underlying structure of a large set of variables, researchers often use the Exploratory Factor

Analysis (EFA). To assess the school-wide cultural competence and analyse if policies,

programs and practices align, Nelson et al. [110] used Exploratory Factor Analysis to identify

factors that influence the construct. EFA has also found applications in the development of the

P a g e | 19

Simulation Experience Scale for nurses [111], Risk Management Practices [112], assess the

students’ competency for current skills and perception of importance for future skill [113].

Other applications include the identification of underlying constructs of existence, relatedness

and growth needs and the travel difficulties [114], drivers that influence travel decisions while

using travel apps [5] and the customers’ attitude and intention to use AVs [115].

When the researcher intends to test if a construct is consistent with the researchers’ perception

of the construct, Confirmatory Factor Analysis (CFA) is used. Unlike EFA, which is

exploratory, in CFA, researchers test if the data fits the hypothesised measurement model- often

derived from previous analytical research or theories. For example, Dubey et al. [116] used

CFA to explore the enablers of the Six Sigma implementation and their contextual

relationships. Other applications include understanding the competency levels among dentists

[113], adoption of a computer-based model to monitor parking revenue inflow [117], drivers

that influence travel decisions while using travel apps [5] and the development and validation

of the Ego Identity Process questionnaire [118].

2.3.2 Open-ended Responses

This section discusses the theory, benefits, issues, and uses associated with the open-ended

approach. Open-ended questions allow respondents to express attitudes or opinions in their

own words, and notably, Renesis Likert [18] argued in favour of the open-ended approach to

measure public attitudes. Likert [18] streamlined the data collection by focusing on collecting,

recording, and processing information. It is advantageous to use open-ended questions to

measure operative attitudinal properties such as ambivalence, inconsistency and embeddedness

[107]. Geer [15] argued that the closed-ended approach is preferred over open-ended questions

primarily due to convenience and not based on the inability to measure public attitudes.

Open-ended questions give researchers and policy-makers insights into responses that

individuals give spontaneously, which mitigates the bias associated with the use of closed-

ended responses [18], [119], [120]. Also, open-ended questions are not restrictive and help

identify the problem, thereby reducing the overall time for the analysis [106]. Compared to the

closed-ended approach, the open-ended approach provides more detailed information [53].

Schuman, Ludwig and Krosnick [49] demonstrated that the results from the open-ended

responses could match those from closed-ended responses if designed carefully.

P a g e | 20

Open-ended questions should be the preferred choice to measure the respondents’ knowledge

about a topic or the frequency of their undesirable behaviour [48]. The use of open-ended

questions eliminates the potential for bias arising from the enumerator making suggestions to

the respondents [25]. The open-ended questions allow individuals to name and communicate

relevant issues to the individual, which facilitates identifying their priorities more accurately

[107], [121]. Furthermore, open-ended questions could even be used in large samples to

identify aspects and wordings used in closed-ended questionnaires [52].

The proponents of the open-ended approach consider the closed-ended approach to be

incomplete, unnatural and rigid and likely to distort the respondents’ attitudes [18]. Compared

to the closed-ended approach, it might be expensive to use an open-ended approach, and so to

cope with the financial constraints, Likert [122] proposed using an open-ended approach with

reduced sample sizes. In his opinion, this is preferable to the use of methods that are less

accurate and biased. Esses and Maio [107] consider the open-ended approach convenient, as it

relieves researchers from the burden of having to test the statements extensively before using

them. In addition, open-ended questionnaires can easily be adapted across different cultures

and samples, while the statements used in the closed-ended approach might be sample- or

culture-specific [107], [123].

The use of open-ended questions to measure attitudes does have its challenges. For example,

Griffith et al. [124] argue that the use of open-ended questions might slow the individuals down

and force them to use the additional time to answer the question. Other issues include a.

attracting responses outside of the frame or importance to the researcher/analyst [50], [52],

[120], b. poses difficulties for the interviewers to collect data and for the analysts to analyse

and code the data [18], c. increases the respondents’ burden, enumerators and analysts [17],

d. lowers the reliability of the collected data [18].

Furthermore, the responses to open-ended questions are likely to be influenced by external

events and developments in the media [49]. Considering the difficulties and economic

feasibility, the advocates of the closed-ended responses consider open-ended questions to be

appropriate for pre-test and inappropriate for practical use [18]. Some researchers argue that

open-ended questions measure the ability of the respondent to articulate a response and not the

attitude [125]. Critics of the open-ended approach argue that their responses capture superficial

concerns; however, Geer [15] argued that the scholars overstated the claim that the open-ended

questions tap superficial concerns. In online surveys, the size of the text boxes has shown to

P a g e | 21

influence the responses since longer text boxes produce more words and ideas per respondents

[126]–[128]. Moreover, poorly stated instructions and the respondents’ interest could affect

response length and quality of information [128]–[130]. Ambiguity in responses increased item

non-response, and survey break-offs are some of the other issues associated with the use of

open-ended questions [19], [129], [131].

With the advent of the internet, researchers started conducting surveys over the web and the

size of the text box and the presence of motivating text influence the responses to open-ended

responses [132], and the non-responses to open-ended questions was a significant issue, also

in web-based surveys [129]. The higher break-off among respondents answering web-based

open-ended questionnaires was attributed to the absence of interviewers to motivate them.

Furthermore, lack of flexibility in completing web surveys at the convenience of the

respondents is also a deterrent to participation [133]. Given the increasing penetration of

smartphones, it is essential to analyse its implications on the responses to the open-ended

questions. The length of open-ended responses was relatively shorter among respondents

answering the survey using mobile phones [134], [135]. Responses from smartphones had

longer response time per character, were often less precise, and had more abbreviations [136].

Reacting to this, Revilla and Ochoa [136] expressed their concerns over the reliability of open-

ended responses collected using surveys not optimised for smartphones. It is interesting to

revisit this with surveys now increasingly optimised for smartphones and that individuals are

more used to typing using smartphones

Researchers use open-ended questions to collect suggestions on methods to improve the bus

service, opportunities and barriers for energy use and conservation [70], [137]. They have also

been used to understand the willingness-to-pay for AVs and feedbacks on AVs [38], [41], [85].

Hulse, Xie and Galea [95] coded the attitudes (captured using open-ended responses) towards

autonomous vehicles into five categories.

2.3.2.1 Coding of Open-ended Questions

Coding is a prerequisite to make statistical predictions with open-ended responses and allows

analysts to compute frequencies and percentages, perform non-parametric tests and statistical

analysis [95], [119], [123], [137]. The data should be coded carefully into categories before

transforming the same to a nominal scale [138], and the diversity in responses might pose

difficulty in coding [119]. Coding poses a challenge related to the hiring of coders and the

coding of data [53]. Depending on the categories to which the responses fall, researchers apply

P a g e | 22

weights and ensure the reliability, agreement and validity of the coded data [106], [139]. Data

is reliable if different observers or analysts agree to a specific coding of categories, and a coding

scheme is valid when it captures the truth about the measured attitude. Interestingly, reliability

in coding does not necessarily ensure the validity of coding because there could be a consensus

on the coding scheme among the different coders, but it might be far from the truth [140].

Niedomysl and Malmberg [141] found the inter-coder variability while coding open-ended

responses negligible, and the differences, if any, should be explored further to develop better

coder instructions.

2.3.2.2 Position of Open-ended Questions

The position of the open-ended questions can influence the responses. Placing them at the end

can alter the original ideas of the individual and cause higher attenuation [51]. Having open-

ended questions at the beginning of a questionnaire could facilitate identifying the suitable

categories for the closed-ended questions. And positioning them at the end of the survey helps

capture more information (impressions, experiences and other aspects) than merely their

correlations [17]. A more suitable approach is to place them only initially or place them at the

beginning and ending [51]. The sequence of data collection did not influence the responses to

a closed- and open-ended question [142]. The opinions seem to be quite divided, and

researchers should explore this further.

2.3.2.3 Missing Values in Open-ended Surveys

Compared to the closed-ended questionnaires, response rate (the rate of return) is lower for

open-ended questionnaires [52], [127], [143]. Administering the open-ended question to

individuals who cannot articulate their responses might invite non-response [120], [125]. Also,

administering open-ended surveys to individuals without or with shallow levels of formal

education might increase non-response [123]. Consequently, the proponents of the closed-

ended approach often argue that the open-ended questions measure the ability to articulate

responses and not the attitudes [120], [125]. However, Geer [14] attributed non-response to the

lack of interest in the topic, independent of articulating. In web-based surveys, the proportion

of missing values is often high due to interviewers’ absence who might encourage respondents

otherwise. The length of the text boxes might influence non-response; longer text boxes might

reduce non-response but may trigger invalid entries [119] and could be because individuals

might respond to closed-ended questionnaires without applying much thought [124].

P a g e | 23

2.3.2.4 Extraction of Information from Open-ended Responses

For extracting the information from open-ended responses, word-based analysis has been used

extensively in qualitative data analyses that expect the data structure to emerge or validate a

thematic content analysis [144], [145]. The analysis involves counting the co-occurrence of

word units, which eventually helps identify the cluster of concepts and the strength and

direction of their relationships. To understand the factors contributing to conflicts, Jehn [144]

collected data using open-ended interviews, and each term in the response was indexed in the

alphabetical order along with the frequency of occurrence and helped identify similarities but

less helpful in making conclusions about the context of the response [144]. To gain quantitative

insights from qualitative data, Schmidt [146] used the text analysis software “CATPAC” to

obtain frequencies of words and the association between words, used later to perform

multivariate analysis. Analysing the open-ended responses requires several coders, which

increases the cost and time for analysis and for this, crowdsourcing, which utilises the internet

to divide work between several participants, could be used. Crowdsourcing has massive

potential for extracting information from large volumes of text data and has the “human touch”

to the analysis- often absent in computerised analysis. Jacobson, Whyte and Azzam [147] and

Benoit et al. [148] utilised the potential of crowdsourcing to code, evaluate and quantify open-

ended responses, although the cost might still be a concern.

The content analysis takes the analysis a step further from exploring frequencies of words to

identifying themes or patterns [149]. Words, themes or concepts in an open-ended response

should be coded into manageable categories using the frequency of themes or patterns [150].

Jehn [144] performed content analysis, where each word in the response was associated with a

keyword (or its synonyms) - often derived from a theory. Kawashima and Kawano [151]

performed the six-step content analysis to extract information from open-ended responses, with

identification of overall trends, determination of meaning units, identification of the dominant

theme, the grouping of themes, naming the meanings of content categories and benchmarking

the results being the different steps [151], [152]. Using this, Kawashima and Kawano [151]

looked into the reasons for attempting suicides by interviewing the survivors, whereas Mamali

et al. [152] investigated the experience of living and coping with spouses who experienced

sensory loss. In addition to this, Mamali et al. [152] also performed sentiment analysis on some

open-ended responses. Kelly, McKnight and Schubotz [153] used thematic content analysis to

analyse the perceptions among Irish youth on community relations in Northern Ireland. To

analyse the physical and psychological needs of cancer survivors, Burg et al. [154] used

P a g e | 24

directed content analysis to code themes from the open-ended responses. To ensure consistency

in the independently coded responses, the researchers evaluated the coding strategies over

biweekly meetings [156], which indicates the intensive nature of the analysis. In more recent

work, Savic et al. [155] used thematic analysis to assess how nurses cope with shift work, in

which they extracted themes derived based on previously established theories and those that

emerge from the data, thereby providing a more balanced approach [155].

Mossholder et al. [156] used the Dictionary of Affect Loading (DAL) to process the open-

ended responses from a survey among the managers. Data processing involves spell-checking,

removal of apostrophes and trailing “s” while removing articles and conjunctions. DAL does

not take the contextual meaning into consideration or the influence of negative modifiers and

other words. The analysts used the processed words to generate response scores to evaluate the

responses [156]. While using automated text analysis to analyse the main sensory

characteristics of mayonnaises, one of the main challenges ten Kleij and Musters [157]

encountered was with the pre-processing of the text. They highlighted the need to consider

combinations of words to capture the respondents’ intention. The use of automated text analysis

eliminates the need for analysts to identify topics before analysing open-ended responses, and

researchers/analysts generate topics.

Interestingly, the extracted topics are of actual relevance to the respondents and not meeting

the researchers’ expectations. Researchers can then re-evaluate the theoretical formulations and

restructure future studies accordingly [158]. For AVs, particularly for assessing trust in self-

driving vehicles, Lee and Kolodge [159] used structured Topic Models. To analyse the

sentiment in the open-ended feedback from students, Hynninen, Knutas and Hujala [160] used

sentiment analysis and highlighted the potential for automated text analysis.

2.3.3 Concluding Remarks

In the above sub-sections, we introduced the two approaches and discussed some of the

advantages and disadvantages of the two approaches. The closed-ended approach is used

widely for measuring attitudes, driven primarily by convenience, higher completion rate and

better execution time. Researchers can easily convert responses to the closed-ended responses

to numbers, but it may pose considerable stress to the participants as they go through a three-

stage process. First, individuals interpret the statement, then relate it to the measured attitude

and convert the agreement or disagreement to an appropriate point on the scale. Also, there are

some concerns associated with the use of the closed-ended approach to measuring attitudes.

P a g e | 25

These include a. what the optimal number of points on a scale is, b. should we include neutral

points on a scale, c. how do we ensure that the list of statements is comprehensive, d. concerns

regarding the use of statements in a study without adapting and validating them.

On the other hand, open-ended questions do not constrain individuals to voice their opinion in

a particular manner. In other words, researchers do not force responses; they could be closer to

reality and could be used in the exploratory studies (for relatively new topics) to identify

aspects that may be of fundamental importance to the respondents. By identifying these aspects,

we could design the statements for the closed-ended studies can be designed. Open-ended

responses provide in-depth information, but this does not necessarily translate into better

predictions [53]. Other challenges associated with the use of open-ended questions to measure

attitudes include a. the effort to articulate and write a response might deter some respondents

from answering open-ended responses, b. they might generate responses that may be outside

of the frame of the study, c. coding the open-ended responses is burdensome, d. the analysis is

time-consuming and prone to subjective bias, e. for a policymaker, there is a considerable time

delay between the implementation and the final output.

2.4 FRAMEWORKS TO MEASURE THE INTENTION TO USE

Having introduced closed- and open-ended approaches to measure attitudes in the previous

section, we focus on analysing the different psychological frameworks used to measure

attitudes. These frameworks capture the relationships between attitudes and behaviours within

human action. Numerous researchers have used the Theory of Planned Behaviour (TPB) to

measure intentions in travel behaviour analysis. TPB is an extension of the “Theory of

Reasoned Action (TRA)” and uses Attitude Towards the Behaviour (ATB), Subjective Norms

(SN) and Perceived Behavioural Control (PBC) to predict the intention to use [27], which is

used to predict human behaviour and is used extensively by researchers in different fields.

Figure 2.1 illustrates TPB’s framework proposed by Ajzen [27].

We now discuss the applications of TPB in travel behaviour research. Researchers have used

TPB to measure the intention to use public transport, psychological factors influencing the use

of personal vehicles or public transport, willingness to pay and the intention to use bike-sharing

for holiday cycling [3], [6], [31], [33], [37]. TPB has found applications in measuring social

identity, frequency of using the car, bicycle, public transport and walking [34], [84]. Buckley,

Kaye and Pradhan [97] used TPB for a simulation driving experiment of Level 3 AV to predict

the intention to use AVs and concluded the emphasised the need to include trust. Other studies

P a g e | 26

that used TPB to predict the intention to use AVs include the works of Moták et al. [161], Koul

and Eydgahi [162], Chen and Yan [163] and Jing et al. [164]. Except for the work by Moták et

al. [161], all others used extensions of TPB with Koul and Eydgahi [162] using technophobia

and perceived safety, Chen and Yan [163] using technological-savviness and Jing et al. [164]

using technophobia and perceived safety. It is worth noting that all three works reported above

are independent and separately extend the constructs of TPB.

Figure 2.1 The Framework for the Theory of Planned Behaviour Source- [27]

Technology Acceptance Model (TAM) also evolved from TRA and postulated that the

Perceived Usefulness (PU) and Perceived Ease of Use (PEoU) influence the behavioural

intention to use [28] (the framework for the TAM in Figure 2.2). TAM has found many

applications in the context of predicting the intention to use AVs [96], [165], [166]. As was

observed for the TPB, researchers have used many extensions of TAM also. The predictive

capability improved with the introduction of constructs such as trust [96], [166]–[168] and an

external locus of control [167]. Choi and Ji [169] hypothesised that system transparency,

technical competence, and situation management capabilities to influence trust and trust

influenced the behavioural intention in their modified framework. In the research by Zhang et

al. [168], PU and Perceived Safety Risk influenced trust, which later influenced the acceptance

of AVs.

Panagiotopoulos and Dimitrakopoulos [169] used an extension of TAM that used perceived

trust and social influence to predict the behavioural intention to use of AVs. Wu et al. [170]

used TAM to analyse the effect of environmental concern on the public acceptance of

Autonomous Electric Vehicles and concluded that the Green Perceived Usefulness (GPU) and

P a g e | 27

PEoU are essential determinants. The constructs of TAM explained 40% of the variance in the

intention to use Autonomous Shuttles, as reported by other researchers [161]. Another variation

of the TAM, the Car Technology Acceptance Model (CTAM), which used effort expectancy,

performance expectancy, social influence, perceived safety, anxiety, attitudes about the

technology, desire for control, technology use and technology acceptance, has found

applications in predicting the intention to use AVs using respondents [73], [74], [171].

Figure 2.2 The Framework for the Technology Acceptance Model [28]

The third theory, the Unified Theory of Acceptance and Use of Technology (UTAUT), is a

social-psychological theory proposed by Venkatesh et al. [29] to predict the acceptance of the

technology. UTAUT assumes that performance expectancy, effort expectancy and social

influence impact behavioural intention. Behavioural intention and the facilitating conditions

influence user behaviour, while age, gender, and experience are moderating factors [29]. We

present the framework in Figure 2.3, and to gain insight into the underlying principle, readers

may refer to Venkatesh et al. [29].

Various researchers used UTAUT to evaluate the intention to use AVs [165], [172]. In the

study by Leicht, Chtourou and Ben Youssef [172], the individuals’ technological savviness

moderated the adoption and purchase of AVs. Madigan et al. [173] also relied on UTAUT to

analyse the decisions that influence automated public transport use. Their analysis showed a

strong positive influence of performance expectancy, social influence, facilitating, and hedonic

conditions on behavioural intention. Effort expectancy does not influence the behavioural

intention or moderating variables such as age, gender or experience [173]. When comparing

TAM, TPB and UTAUT to predict the intention to use AVs, the performance was best for

P a g e | 28

TAM, followed by TPB [165]. However, Moták et al. [161] obtained comparable results for

TAM and TPB to predict the intention to use Autonomous Shuttle.

Figure 2.3 The Framework for the Unified Theory of Acceptance and Use of Technology

(UTAUT) [29]

2.5 MODELLING APPROACHES TO PREDICT AV USE

To set a context for the analysis, we aim at measuring the intention to use Autonomous

Vehicles. So, in this section of the chapter, we summarise the different statistical modelling

techniques used to predict the intention to use/willingness to pay for AVs.

Researchers have used Multiple Linear Regression to model the relationships between a

response variable and more than two explanatory variables by fitting a linear equation to the

observed data. The estimation involves computing this best-fitting linear equation by

minimising the sum of squared errors- the vertical distance of each data point from the line for

all the observations. Researchers have used this to model the intention to use AV [162], [169],

[171]. Another approach, Hierarchical Linear Regression, involves statistically controlling the

effects of specific variables by adding them as blocks to analyse if adding these variables the

predictive capability of the models and the moderating effects of a variable. In principle, they

are a complex form of the ordinary least squares models [174]. To predict the intention to use,

while accounting for moderating effects, many researchers have used Hierarchical Linear

Regression [38], [96], [165], [173].

To assess the complex relationships simultaneously, researchers often use Structural Equation

Modelling (SEM). SEM is a collection of statistical techniques that captures the relationship

between independent variables and one or more dependent variables. The independent and

Performance Expectancy

Effort Expectancy

Social Influence

Facilitating Conditions

Gender

Age

Experience

Voluntariness of Use

Behavioural Intention

Use Behaviour

P a g e | 29

dependent variables can be continuous or discrete or even factors or measured variables in this

approach. Commonly included estimation techniques include Maximum Likelihood,

Generalised Least Squares. Readers may refer to Ullman and Bentler for detailed SEM and

other estimation techniques [175]. Researchers have used techniques such as Partial Least

Squares [PLS] [167], PLS-SEM [163], [166] and SEM [164], [168], [170]. Chen and Yan [165]

used a PLS-Multi-Group Analysis to account for the heterogeneity in the choices.

Researchers have used discrete choice models in the context of AVs. For modelling discrete

choice variables that are nominal, researchers have used Multinomial Logit models. The model

assumes that each alternative has a utility, and the models are estimated using the utility

maximisation technique and assuming a Gumbel distribution for the error components. The

Mixed Logit model addresses some of the issues associated with the Multinomial Logit model

by allowing for random taste variation, unrestricted substitution patterns and correlation

between the alternatives and is estimated through simulation [176]. For example, Daziano,

Sarrias and Leard [176] used a mixed logit model with a normal distribution for the key

parameters and log-normal distribution for other parameters to estimate the willingness to pay

for AVs. A random utility model (RUM) with a logit kernel was used to model the choice

between conventional vehicles, shared and private AVs by Haboucha, Ishaq and Shiftan [40].

2.6 FACTORS INFLUENCING “INTENTION TO USE/PAY” FOR AUTONOMOUS

VEHICLES

In this study, we performed case studies on the intention to use Autonomous Vehicles (AVs).

Literature review indicates that the socio-demographic characteristics of the individual,

attitudes towards AVs, land-use characteristics and current travel characteristics of the

individuals are likely to influence the intention to use/pay for AVs. In the paragraphs to follow,

we discuss each of these factors in greater detail. We believe that the variable effects reported

by various researchers discussed in the subsequent sections are reasonable.

Socio-demographic Characteristics - Studies indicate gender influences the intention to

use/share/buy AVs [177], [178]. Higher levels of concerns about the use of AVs and scepticism

among women might have contributed to fewer women willing to use AVs [71]. Research

indicates that younger individuals have fewer concerns regarding AVs and are willing to use

AVs [68], [74], [178], and older individuals are more likely to use conventional cars [66], [68].

The educational qualification of the individual play an important role in the choice of AVs. Be

it personal or shared AVs, educated individuals are more likely to use AVs, which could be

P a g e | 30

related to their increased awareness [40], [68], [85]. As disposable income determines the

spending potential of an individual, it is unsurprising to observe the positive correlation

between income and the willingness to pay for AVs [39], [68].

Vehicle ownership and the type of vehicle is another influencer of the choice of AVs.

Individuals not owning a car or using cars with advanced automated features are equally likely

to use AVs [41], [85]. Furthermore, the intention to use shared AVs was high for individuals

from households with fewer vehicles [179]. Individuals with a passion for driving are more

likely to prefer conventional vehicles and less likely to use AVs [42], [73], [172]. Since AVs

are likely to allow mobility for individuals aged or suffer from physical disabilities, it is not

surprising to observe a higher intention among such individuals [74], [180]. Finally, individuals

familiar with the car-sharing companies are willing to pay less for AVs and instead use Shared

AVs [66], [179].

Attitudes towards AVs and Their Influence - The attitudes towards the use of AVs

significantly influence the intention to use and the willingness to pay for it. A positive attitude

towards AVs is often predicated on an increased awareness of AVs [75]. Higher scores for

perceived usefulness, perceived ease of use, trust and perceived safety of AVs are associated

with an increased intention to use AVs [166], [167]. Many believe that AVs improves traffic

safety significantly by reducing accidents, and this is likely to motivate more individuals to use

AVs [68], [73], [169]. The improvements to transportation efficiency, reduction in fuel

consumption and emissions, elimination of parking woes and the ability to multi-task might

condition the intention to use AVs [4], [42], [72], [180]. The ability to multi-task [42], [72],

the potential to discuss with fellow passengers and enjoy the views while travelling excites

them about AVs [16]. Concerns regarding the use of AVs among respondents include the lack

of manual control [72], [85], failure of the system [72], [73], [180], the legal liabilities for the

drivers or owners, particularly in the event of an accident [39], [72], [180]. Anxieties about

sharing and the monitoring of data [39], [169], performance expectancy and the trust in AVs

[167], [169], [181], perceived usefulness, perceived safety risk [170], perceived ease of use

[73] are other factors affecting the intention to use AVs. Apart from these, the environmental

consciousness and the technological-savviness of the individuals could also impact decisions

[40], [179]. To model the attitudes measured using closed-ended questions, researchers have

used SEM [163], [164], [170], PLS [166], [167], [178], dummy indicator variable [68], EFA

[4], CFA [38], [40], [168], [170], [172], [181] and PCA [65], [88], [172]- to name a few. Hulse,

P a g e | 31

Xie and Galea [95] coded the attitudes (captured using open-ended responses) towards

autonomous vehicles into five categories.

Travel Characteristics – the current travel patterns of the individuals, could play a decisive

role in their choice of AVs. For instance, the vehicle miles travelled [39], [182], the frequency

of driving [39], [41], past experiences with crashes [66], [68], joint/solo travel [66], need to

perform errands during the day [40] and the distance to workplace [66] might alter the decision.

Land-use Characteristics - People living in built-up areas or congested streets do not favour

using fully autonomous vehicles [16], [38]. Compared to respondents from urban areas, those

from rural areas have more concerns regarding the use of AVs [41]. Furthermore, the

willingness to pay for AVs is higher among higher-income neighbourhoods and lower among

individuals in more job-dense neighbourhoods [66]. Finally, those in downtown and suburban

areas are less likely to prefer conventional gasoline vehicles [68].

2.7 NATURAL LANGUAGE PROCESSING AND ITS APPLICATIONS

Researchers use closed- and open-ended questions to measure attitudes, and the open-ended

questions are preferred to gain insights on the respondents’ attitude towards a relatively new

and complicated problem [13]. Considering this, we decided to undertake research related to

Autonomous Vehicles (AVs). To extract information from the open-ended responses in our

survey, we used techniques in Natural Language Processing (NLP). We now briefly discuss

some methods (pertinent to the topic) in NLP before describing LDA in detail.

Term Frequency*Inverse Document Frequency (TF*IDF) indicates the relative importance of

a word in a document to the corpus. Term Frequency of a word is the frequency of occurrence

of the given word in a document. However, Inverse Document Frequency has an important

role: to diminish the weight of words that occur very frequently in the document and increase

the weight of words that occur rarely- done to account for words that occur very commonly in

most documents [183]. However, researchers often criticise these models for not having a solid

mathematical framework while being computationally expensive [183], [184].

Another approach that could be adopted is Sentiment Analysis, which aims to identify, extract,

and classify responses based on polarity. Sentiments can be binary (positive and negative) or

n-point scales (strongly disagree, disagree, neutral, agree, strongly agree). Sentiment Analysis

is often helpful in understanding the extent of acceptance of service and eventually improve

P a g e | 32

these services. However, one of the issues associated with the approach is that it merely

captures the sentiments without providing deeper insights [185].

Researchers also use Word Embeddings, aka. Word Representation to measure similarities

between words by capturing the semantic and syntactic information of words. They are used to

build continuous word vectors after accounting for their contexts. Consequently, we place

words that mean the same closer in the vector space. [186]. Researchers approach Word

Embeddings in two ways, a. words expressed as vectors of co-occurring words, b. words

expressed as vectors of the linguistic contexts in which they occur [187].

The fourth approach, namely, Topic Models, are statistical models to discover latent “topics”

in a collection of documents (corpus) by assuming that words that are specific to a given topic

are more likely to occur in documents that discuss this topic. Commonly used approaches

include LDA [184], supervised LDA [188]. Researchers often argue that these approaches are

often suited for analysing large corpora and not for short text [189]. In practice, this assigns to

each text a series of numerical indicators (e.g., topic proportions in the Latent Dirichlet

Allocation algorithm) later used in modelling. LDA follows a conceptually similar idea to PCA

or Factor Analysis, where we extract fundamental (latent) variables or vectors from the data.

Before the use of Topic Models, the data from the open-ended responses should be cleaned or

pre-processed. We begin this by splitting the responses into words/token, removing numbers,

punctuations, and other symbols. Having done this, some of the most common words in the

language, such as “a”, “able”, “about” (to name a few) that do not convey any specific

meaning/relevance- often termed as “Stop Words” are removed. In addition to this, we

combined words that would otherwise have different meanings when assessed independently.

For example, “public” and “transportation”, which provides a different meaning independently,

but has an entirely different meaning when combined, “public transportation” is identified

using “Regular Expressions” and replaced.

Furthermore, we reduce different words in the open-ended responses from the exact words to

their root form. For example, all words- “game”, “gaming”, “gamed”, “games” is reduced to

the root form “game” [190], [191]. Data pre-processing is critical in text analysis and improves

the accuracy of the analysis [192], [193].

We used this pre-processed data to analyse data and outline the NLP technique, viz., Topic

Modelling and its applications. NLP empowers computers to understand text and speech as

P a g e | 33

humans. In this thesis, we used Topic Modelling, a generative statistical model- that allows a

set of observations to be explained by latent groups based on the similarities in some parts of

the data. We find Latent Dirichlet Allocation (LDA) and Supervised Latent Dirichlet

Allocation (sLDA) appropriate to extract information compared to the other text analysis

methods.

Before explaining sLDA, we explain LDA, a popular method in Topic Modelling that aims to

identify latent constructs in text data. Having such latent variables, we transform each of the

open-ended response into numerical values to be used for prediction. LDA is similar to a

multinomial principal component analysis (PCA) as LDA converts a text document

(represented by word frequencies) into a linear combination of topics (represented by word

frequencies). The linear combination of topics is conceptually comparable to the eigenvectors

in PCA. In LDA, the given set of documents is represented in the Bag of Words (BoW) format,

a vector of word frequencies. The Bag of Words is a multiset of its words that disregards

grammar and words’ order while keeping the frequency. Using BoW and K topics (number of

topics extracted), LDA extracts a set of K topics that minimise the reconstruction error of the

original documents, and each of the extracted topics is a BoW. When each of the future

unlabelled documents is projected on the topic space, they are re-represented as a combination

of K topics. In addition to the number of topics, LDA requires two additional parameters, α

and η, that determine the sparsity of document-topic and response variable distribution priors,

respectively [184], [194]. In sLDA, the documents and the responses are modelled jointly to

obtain the latent topics that best predict the response variable [188]. In other words, it is a

supervised method, where the resulting topics are the ones that maximize the accuracy of a

particular model and is accomplished by modelling (maximum-likelihood estimation) jointly

the documents and the response variable to find latent topics that best predict the response

variable for future unlabelled documents. However, using the dependent variable for the final

model (when Topic Model results are used to predict some other choice) as the response

variable for sLDA might cause endogeneity issues. For the dataset from India (Chapter 3), the

response variable was the intention to use Shared AVs and for the dataset from the USA

(Chapter 4), the response variable was whether the respondent agreed with the open-ended

question.

We present the Probabilistic Graphical Model for sLDA in Figure 2.4 and referring to the

figure; the dataset comprises of “D” documents comprising of “N” words each Wd, n (n = 1 …

N and d = 1 … D). We assign a topic among K available topics for every word, and each topic

P a g e | 34

k (k = 1, …, K) consists of a vector βk that contains the words and associated frequencies in

this topic.

Figure 2.4 Probabilistic Graphical Model for sLDA

The result is that for a given corpus of documents and a response variable, a set of topics that

span across every document can be obtained, which acts as their “common building block”.

Furthermore, for every document, a set of “K” numbers indicating “how much” each document

belongs to the building blocks can be obtained. We use them for the estimation of the models.

We assess the topics for their meaningfulness, along with the inter-topic distance between the

extracted topics. An investigation into the overlap using the visualisation tool pyLDAvis [195]

provides insights on whether the topics are distinct.

Topic Modelling opens the possibilities for the extraction of information from text. This

technique has been used previously in the prediction of nonhabitual overcrowding of public

transport, taxi demand based on the information on the special events on the internet, travel

route recommendations using geotagged photos and discover trip patterns such as destination,

time of arrival, day of the week and stay duration using data from transit smart card records

[196]–[200]. Hasan and Ukkusuri [201] used Topic Models to extract information from social

media platforms to obtain multi-day activity patterns of individuals.

Since one of the objectives of our study was to identify if researchers could use Topic

Modelling to extract information from the open-ended responses, we investigated if it has been

used previously in survey analysis. To extract information from open-ended responses from

the American National Election Study (ANES), Roberts et al. [158] used structured Topic

Models and observed that they could recover relationships similar to those by hand coders.

Researchers have used Topic Modelling to extract information from the open-ended responses

in market research, which reduced the analysis time and human bias. However, the accuracy

N K

α θd Zd, n Wd, n βk

Yd η, σ2

P a g e | 35

of predictions was affected by the frequency of topics and the number of topics that could

adversely affect the topics' quality [202]. Tvinnereim and Fløttum [203] and Mitsui, Kubo, and

Shoji [204] used Topic Models to extract information from the open-ended survey questions

on climate change and protected area assessment. We could not find the application of Topic

Models to extract information from open-ended responses in the context of travel behaviour

research. If found beneficial, policymakers and analysts in travel behaviour research could use

these techniques to extract information from the open-ended responses.

2.8 SUMMARY

This chapter presents a literature review on different aspects related to the measurement of

qualitative data. The idea is not to present an exhaustive review but to identify the research

questions that need further investigation. We first discussed the need for measuring attitudes

in transportation analysis. As is the case with the measurement of attitudes in other fields, most

researchers have used the closed-ended approach in transportation analysis. Furthermore, a

discussion on the alternative approaches for measuring attitudes, closed- and open-ended, is

presented. In this regard, the advantages and disadvantages associated with using each of these

approaches are discussed. We then carry out a discussion on concerns with the use of each of

these approaches. Later, the literature review focused on the frameworks used to measure

attitudes in travel behaviour studies and the use of Topic Models. We finally discuss the

different modelling techniques used by other researchers.

P a g e | 36

P a g e  | 37 
 
3 TOPIC MODELLING FOR OPEN-ENDED RESPONSES – A 
CASE STUDY ON THE INTENTION TO USE SHARED AVS 
3.1 INTRODUCTION 
In this chapter, we address the first three research objectives of this thesis, viz., a. to analyse if 
the method for collecting qualitative data influences the survey responses, b. to develop an 
approach to  extract and  process open-ended responses  from a  survey, c.  to  compare the 
relative performance of the open- and closed-ended responses in analysing qualitative data.  
To investigate these in travel behaviour research, we design and deploy questionnaires that 
measure the intention to use Shared AVs in India. We use Shared AVs because we believe 
presenting  a  questionnaire  about  a  relatively  new  topic  might  generate  interest  among 
respondents to answer the open-ended surveys, as recommended by Rugg and Cantril [13]. We 
design the questionnaire using the Theory of Planned Behaviour distributed in India between 
November  2017 and  March  2018. We  published  this  work in  the  IEEE Transactions  on 
Intelligent Transportation Systems, and we adapted the text in different sections from the paper 
[205]. 
We organise the rest of this chapter as follows; Section 3.2 discusses the questionnaire design, 
presenting the experimental design and the theoretical framework used in the study. Section 
3.3 discusses the data collection and the data cleaning procedures, and Section 3.4 discusses 
results from the exploratory analysis of the closed-ended responses. We further explore the 
differences in the frequency distributions of the responses to the closed-ended responses and 
their statistical significance. We present the different aspects of the extraction of information 
from closed- and open-ended responses in Section 3.5 and include a discussion on the results 
from exploratory analysis and Topic Modelling and comparing the extracted topics with the 
statements used for the closed-ended responses. A brief discussion of the adopted modelling 
framework is carried out in Section 3.6; the results from the estimation of the model for the 
intention to use shared AVs in Section 3.7, and the final Section (0) presents the salient findings 
and the limitations of the current study. 

P a g e | 38

3.2 QUESTIONNAIRE DESIGN

3.2.1 Experimental Design

This study investigates the intention to use Shared AVs. We collected information on the

various aspects related to attitudes. These include technological-savviness and environmental

consciousness of the individual, perceptions towards AVs, attitudes towards the use of AVs,

subjective norms, and perceived behavioural control. Since this study aimed at identifying the

potential to use open-ended questions to measure attitudes in travel behaviour research, we

used two versions of the questionnaire. The alternative versions were presented randomly to

the respondents using SurveyMonkey’s “Page Randomisation” feature.

Ver_Lk presents respondents with statements depicting the attitude, who must then choose

points on a five-point Likert scale that best describes their attitude, and Ver_OE uses a

combination of open- and closed-ended questions. We used open-ended questions to collect

information on technological savviness, environmental consciousness, and AVs’ benefits and

negative impacts on society. All other attitudes are measured using the same set of Likert scale

questions used in Ver_Lk. Comparing the distributions of the responses to the Likert scale

responses between the two questionnaires facilitates the analysis of the influence of the

questionnaire type. Performing Topic Modelling using open-ended questions to extract

information from open-ended responses is related to the second objective. Furthermore, having

these two datasets allowed us to estimate models that facilitated the evaluation of open-ended

questions to measure attitudes. We present the experimental design in Figure 3.1.

3.2.2 Framework Design

To design the questionnaire, we used the Theory of Planned Behaviour (TPB) [27], adopted

widely in travel behaviour research [3], [6]. TPB posits that attitudes, subjective norms and

perceived behavioural control shape an individuals’ behavioural intention and, eventually,

behaviour. Figure 3.2 illustrates TPBs’ framework used in this research.

To measure technological savviness and environmental consciousness, we used Likert scales

questions in Ver_Lk and a combination of Likert scales and open-ended questions in Ver_OE.

To measure the impacts of AVs on society, we used Likert scales in Ver_Lk and open-ended

questions in Ver_OE. Finally, to measure the attitudes towards the behaviour, subjective norms

and perceived behavioural control variables, we used Likert scale questions for both versions

of the questionnaire and its draft is presented in Appendix A.

P a g e | 39

Figure 3.1 The Experimental Design for the Intention to Use Shared AVs

Figure 3.2 Modified Framework of the Theory of Planned Behaviour

3.3 DATA COLLECTION AND DATA CLEANING

We distributed the survey in India between November 2017 and March 2018 using Facebook

and WhatsApp with the help of bloggers. For the collected data, we checked for inconsistencies

and incomplete records. We also removed records of individuals not answering the intention

to use Shared AVs. Later, we excluded individuals taking more than an hour to complete the

survey, as we suspect such responses to lack coherence. To deal with respondents answering

Ver_OE

Ver_LK

Technological Savviness

Environmental Consciousness

Benefits/impacts to society

Attitudes towards behaviour

Subjective norms

Perceived behavioural control

Technological

Savviness

Environmental

Consciousness

Benefits/impacts to society

Attitudes towards behaviour

Subjective norms

Perceived behavioural control

Closed-ended questions

Closed- and open-

ended questions

Open-ended questions

P a g e | 40

the questionnaire too quickly (a.k.a. speeders), using the information from Qualtrics [206], we

computed the minimum time for a respondent to answer the different questions in our survey.

We classified individuals answering faster than this minimum stipulated time as speeders.

Furthermore, the responses were analysed to identify if respondents answered in patterns

(straight-line and diagonal responses). Speeders who also answered in straight lines in at least

two or more sections were excluded from further analysis.

3.4 EXPLORATORY ANALYSIS

In this section, we focus on the first objective of this research, “to analyse if the method of

collecting qualitative data influences the survey responses.” An online survey in which

questionnaires are presented randomly to respondents and participation was voluntary; we

believe it was quintessential to investigate if the samples were comparable. To facilitate this

comparison, we first analyse if the collected samples are almost equal in terms of the socio-

demographic characteristics of the individuals. Then, having established that the samples are

almost similar, we pursue analysis related to the first objective. In this regard, we first analyse

the frequency distributions before carrying out an in-depth analysis of the statistical

significance of the differences and the impact of the questionnaire type.

3.4.1 Preliminary Analysis

Four hundred and thirty-five respondents (Ver_Lk- 239, Ver_OE- 196) completed the survey.

After removing speeders and those answering in patterns (straight lines or other patterns), the

final dataset comprised 364 responses (Ver_Lk- 201, Ver_OE- 163). In terms of the survey's

duration, on average, respondents answering Ver_Lk took 11 minutes and 43 seconds (std. dev-

7 minutes and 42 seconds) while those answering Ver_OE took 15 minutes and 49 seconds

(std. dev- 8 minutes and 39 seconds). Table 3.1 provides information on the socio-demographic

and travel characteristics of the respondents.

The dataset had higher participation from male respondents (~75%). Additionally, a higher

proportion of students (~35%) and professionals answered the survey. In terms of monthly

income, nearly 50% of the respondents earned between 25,000 and 89,999 (INR). However,

individuals are still hesitant to respond to questions related to income, as almost a quarter of

our respondents were hesitant to report their income, while merely 1% had any issues revealing

information on the occupation. A significantly high percentage used public transportation and

P a g e | 41

car for commuting to the workplace/university. Furthermore, when asked about sharing rides,

~35% of respondents shared rides more than once a week.

Table 3.1 Socio-economic and Travel Characteristics

Variables

Ver_Lk

Ver_OE

Gender (%)

Female (%)

24.50

27.16

Male (%)

75.50

72.84

Average age (in years)

30.39

29.29

Occupation (%)

Student (%)

21.89

27.61

Postgraduate student (%)

12.44

9.20

Academic Faculty (%)

9.95

7.98

Manager (%)

9.45

11.04

Professional (%)

35.32

35.58

Technician and associate professional (%)

3.98

1.23

Others (%)

2.99

4.91

No occupation (e.g. retired, unemployed) (%)

1.00

1.23

Prefer not to answer (%)

2.99

1.23

Monthly Income

(in INR)

0-9,999 (%)

12.94

15.95

10,000-24,999 (%)

4.98

8.59

25,000-49,999 (%)

10.95

11.04

50,000-74,999 (%)

26.87

16.56

75,000-89,999 (%)

12.44

18.40

More than 90,000 (%)

7.96

2.45

Prefer not to answer (%)

23.88

26.99

The mode used for

Commuting to

Workplace /

University (%)

Walk only (%)

39.80

46.43

Bike (%)

33.83

31.63

Public transportation (%)

64.18

74.49

Car (%)

69.65

64.80

Motorbike (%)

33.33

35.71

Intermediate public transportation (%)

45.27

46.94

Travel time between home and university/school/workplace (minutes)

32.00

51.01

Frequency of ride-

sharing (%)

Daily (%)

21.39

16.56

2-3 times a week (%)

17.41

19.63

2-3 times a month (%)

17.41

17.79

Rarely (%)

11.44

3.07

Never (%)

32.34

42.94

In the last 2 years, have you been involved in a road accident? (%)

13.93

15.34

Participation in our survey mainly was voluntary, and hence, we could not ensure the

representativeness of the dataset. However, referring to Table 3.1, there is almost no difference

in the distribution of these characteristics between the dataset versions. Further analysis into

this using the Mann-Whitney U test [207] indicated that the differences were not statistically

significant. Therefore, since there is almost no difference in the socio-demographic

characteristics of the respondents, the responses from the two samples are comparable.

P a g e | 42

3.4.1.1 Attitudes Towards AVs

We captured the attitudes towards the use of AVs using seven statements related to attitudes,

and respondents, in general, had positive attitudes towards the use of AVs. Almost two-thirds

(65%) of participants perceived it “cool to use AVs”, probably due to the perception that AVs

are technologically advanced. Furthermore, an overwhelming proportion of respondents

appreciated the possibility of involving in other activities during travel. This percentage was

further higher for “Strongly Agree” among those answering Ver_OE of the questionnaire. We

observe a similar trend when asked if they believe AVs might alleviate the stress of driving.

Figure 3.3 Frequency Distribution for Attitudes towards Use of AVs

Almost 3/4th of respondents answering the questionnaire expected AVs to eliminate parking-

related issues. However, having to plan travel is a concern among many respondents, and this

P a g e | 43

question also attracted a few additional neutral responses (~40%) for both versions of the

questionnaire. Most respondents were neutral to the idea of having to share a vehicle with

others. The shape of the distribution was similar to that of a bell-shaped distribution. Besides,

nearly half of the respondents opined that the use of AVs might “kill the pleasure of driving”.

Figure 3.3 presents the frequency distribution for the attitudes towards AVs.

3.4.1.2 Subjective Norms

This section of the chapter investigates the respondent’s notion about the perceptions of friends

and family on AVs and their use.

Figure 3.4 Frequency Distribution for Subjective Norms

More than half of those (55%) participating in the survey believed that their friends/family

would use AVs. It is worth noting that, for most respondents, their peers encourage using public

P a g e | 44

transport, which might eventually constitute a motivation to use shared systems. Moreover,

their peers also believe that AVs might reduce congestion and pollution while also making

travel safer. Above all, for nearly 70% of respondents, their peers are positive about them using

AVs. We can observe a higher proportion of respondents agreeing to the statements among

those answering Ver_OE of the questionnaire (refer to Figure 3.4).

3.4.1.3 Perceived Behavioural Control Variables

The third element of TPB indicates how supportive the system is for the individual to exhibit

this behaviour, and in Figure 3.5, we present the frequency distribution of the responses.

Figure 3.5 Frequency Distribution for Perceived Behavioural Control Variables

We explored if the perception on whether the challenges faced by individuals to use AVs

remains unaddressed. About 60% of respondents believed that they might take the time to learn

P a g e | 45

how to use AVs. Most respondents were neutral to whether they believed the system would be

protected against hacking and failures. There was almost no difference in the distribution

between respondents answering the two different versions of the questionnaire. However, when

asked if they have concerns regarding the security of payment systems or the liabilities

following an accident, respondents answering Ver_OE seemed more worried. More

respondents answering Ver_OE were confident that the interactions of AVs with other vehicles

would be safe. About 70% of respondents (for both versions of the questionnaire) believed that

AVs might make travel more efficient. There was a drop in neutral responses among those

answering Ver_OE for questions related to the environmental friendliness of travel by AVs and

their affordability.

3.4.1.4 Intention to Use Shared AVs

Figure 3.6 Frequency Distribution for the Intention to Use Shared AVs

Referring to Figure 3.6, we can observe that an overwhelming majority of respondents (~70%)

favoured using shared AVs. Even though we focus on shared AVs, the results resonate with

the conclusions presented by Schoettle and Sivak [72] that the respondents from India and

China are more positive towards the use of AV in general. It is also worth noting that a similar

observation regarding the use of pooled AVs was also made by Stoiber et al. [208]. The number

of neutral responses was fewer in the dataset “Ver_OE”, and the number of respondents

choosing the extremes (“Strongly Disagree”/ “Strongly Agree”) is higher for Ver_OE of the

questionnaire. There is also a drop in the number of respondents who “Agree” to the statement.

In other words, the use of open-ended questions might have made respondents slightly more

decisive.

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Strongly

Disagree Disagree Neutral Agree Strongly

Agree

Intention to Use Shared AVs

Ver_Lk Ver_OE

P a g e | 46

3.4.2 Statistical Analysis

We observed slightly more positive attitudes among respondents answering Ver_OE of the

questionnaire for most of the statements, and it was necessary to analyse if these differences

were statistically significant. In this regard, the non-parametric test, the Mann-Whitney U test,

was performed [207]. Contrary to the observations made by Baburajan et al. [209], the

differences were statistically significant only for the statement on the perceptions of friends

and family about reductions in accidents.

3.5 EXTRACTION OF DATA

We begin this section with a discussion on the treatment of closed-ended responses; however,

in the subsequent subsections, we carry out discussions pertinent to the second objective, “to

develop an approach to extract open-ended responses from a survey and process the data.”

3.5.1 Treatment of Closed-ended Responses

To begin with, we tested the internal reliability of the Likert scale responses using Cronbach’s

Alpha values. The estimated values for “Ver_Lk” and “Ver_OE” of the questionnaire were

0.831 and 0.830, respectively, ensuring the reliability of the questionnaire. Next, to test the

validity of the questionnaire, we compared the average scores for the Likert scale responses

among two groups (does not follow news about AVs v/s follow news about AVs). Respondents

following news about AVs seem to have responded correctly to the statements. The differences

between the two groups were statistically significant (t-stats) for both versions (“Ver_Lk”-

14.65 and “Ver_OE”- 9.49) of the questionnaire, indicating the validity of the questionnaire.

Factor analysis was performed on the attitudinal questions presented on a five-point Likert

scale. For consistency, we estimated the same number of factors for both datasets. Referring to

the results presented in Table 3.2, Kaiser-Meyer-Olkin (KMO) statistics for the factors are

reasonable. Therefore, these factors are used to estimate the model for predicting the intention

to use shared AVs.

For the question related to the potential benefits of AVs to society, two factors, “Positive

benefits of AVs on society 1” and “Positive benefits of AVs on society 2”, were extracted.

“Positive benefits of AVs on society 1” encompasses the benefits in terms of making travel

more environmentally friendly, less polluting, and safe- by reducing accidents. “Positive

benefits of AVs on society 2” represents benefits such as reducing gender equity issues in travel

and making travel easier for people who cannot otherwise drive. Reduction in congestion and

P a g e | 47

accidents are other benefits accounted for by both factors. The influence of AVs on

employment is captured by “AVs impact on employment”.

Table 3.2 Results of Factor Analysis

Factors

Ver_Lk

Ver_OE

Positive Benefits of AVs on Society 1

KMO

0.793

Autonomous Vehicles will impact society by

Making travel more environmentally friendly

0.856

Reducing traffic congestion in cities

0.609

Reducing transportation induced pollution

0.868

Making travel safer by reducing accidents

0.527

Reducing the need for parking spaces

0.566

Positive Benefits of AVs on Society 2

KMO

0.793

Autonomous Vehicles will impact society by

Reducing traffic congestion in cities

0.485

Making travel safer by reducing accidents

0.588

Making travel easier for people who cannot otherwise drive

0.769

Reducing gender equity issues in travel

0.729

AVs Impact on employment

KMO

0.500

Autonomous Vehicles will impact society by

Causing unemployment of existing drivers

0.732

Creating new jobs for skilled workers

-0.732

Positive attitudes of individuals on AVs- PAT

KMO

0.727

0.717

I think it will be cool to use AVs

0.753

0.799

I can involve in other activities during travel

0.811

0.787

I will be relieved from the stress of driving

0.820

0.779

I can eliminate the parking-related issues

0.722

0.661

Subjective Norms- SN

KMO

0.727

0.717

I think my friends and family

Will use AVs

0.672

0.735

Believe, AVs will reduce congestion

0.695

0.795

Believe, AVs will reduce pollution

0.681

0.791

Believe, AVs will make travel safer, by reducing accidents

0.801

0.740

Will be positive about me using AVs

0.803

0.846

Perceived behavioural control variables 1- PB1

KMO

0.686

0.653

I am confident the system will be protected against hacking and failures

0.776

0.737

I am confident that the interaction with other vehicles will be safe

0.802

0.818

I believe this will make my travel more efficient

0.799

0.711

I believe AVs will make travel more environmentally friendly

0.706

0.704

Perceived behavioural control variables 2- PB2

KMO

0.686

0.653

P a g e | 48

I am worried about the liabilities after an accident

0.684

0.466

I have concerns regarding payment for the service

0.790

0.858

I think AVs will not be affordable to me

0.706

0.598

The factor “positive attitude of the individual on AVs” includes the notion of using AV as

being “cool”. In addition to this, this factor captures benefits from eliminating parking-related

issues, driving stress and the ability to perform other activities during travel. The perception

that friends and family use AVs and optimistic about the individual using AV is captured by

the factor “Subjective norms”. In addition to this, it captures the belief that AVs reduces

congestion, pollution, and accidents. The confidence that the system is conducive for use is

captured by “Perceived behavioural control variables 1”, whereas the concerns that may remain

unaddressed is represented by “Perceived behavioural control variables 2”. “Perceived

behavioural control variables 1” includes the belief that the system is protected against failures

and hacking and that the interaction with other vehicles is safe. It also covers the confidence

that the use of AVs ensures that travel is more environmentally friendly and efficient. The

concerns about the payment system, liabilities in an accident and affordability are captured by

“Perceived behavioural control variables 2”.

3.5.2 Extraction of Information from Open-ended Responses

3.5.2.1 Exploratory Analysis

Four open-ended questions were used, and the responses were cleaned by removing all

punctuations and “Stop Words”- some of the most frequently used words that do not convey

any specific meaning in this context. Additionally, each of the words was reduced to their root

form using “Stemming”. The words so obtained are used for analysis. Table 3.3 presents the

average number of words used by each respondent in the initial response and the cleaned data

to answer each open-ended question.

Table 3.3 Average Number of Words per Response

Open-ended Questions

Original

Cleaned

For my travel needs, I use my smartphone for (OE1)

7.99

5.34

I think transport is a major cause of the environmental problem, because (OE2)

9.26

6.12

The society will benefit from the use of Autonomous Vehicles, as it will (OE3)

9.18

6.24

Autonomous Vehicles are likely to impact society negatively, as (OE4)

11.07

6.85

We explored the distribution of these cleaned words. Hereafter, we refer to the words after

stemming, and spellings might be different. For the first OE1, “book” (12.10%), “map”

(7.66%), “ticket” (6.95%), “googl” (3.95%) and “find” (3.71%) were the five most frequently

P a g e | 49

used words. To answer OE2, “pollut” (10.93%), “emiss” (4.45%), “vehicle” (3.64%), “air”

(2.53%) and “fuel” (2.13%) were used. When asked about the benefits of AVs to society,

respondents used “reduc” (9.76%), “accid” (4.73%), “pollut” (3.12%), “drive” (2.52%) and

“less” (2.41%). Respondents used “job” (3.74%), “driver” (2.71%), “vehicl” (2.43%), “may”

(2.25%), and “loss” (2.15%) when asked about the negative impacts of AVs to the society. We

present the word clouds based on the cleaned responses in Figure 3.7.

Figure 3.7 Words Clouds for OE1, OE2, OE3, OE4

3.5.2.2 Results from the Topic Models

We extracted the responses from the open-ended using LDA and sLDA. We do not see an

improvement with sLDA over LDA, which could be because of the significant overlap between

the topics extracted from LDA and sLDA for our dataset. In the discussions presented below,

we use the prefix “To_L” for topics extracted using LDA and “To_S” for topics extracted using

sLDA. For example, the first topic extracted for OE1 is labelled To_L11 and To_S11 for LDA

P a g e | 50

and sLDA, respectively. Table 3.4 presents the top 5 words of each topic for the open-ended

questions.

Table 3.4 Top 5 Words for Each Topic for Open-ended Questions

Word_1

Word_2

Word_3

Word_4

Word_5

OE1- For my travel needs, I use my smartphone for

To_L11

map

googl

find

rout

locat

To_S11

map

googl

find

place

restaur

To_L12

travel

train

check

time

statu

To_S12

travel

locat

rout

train

check

To_L13

book

ticket

map

hotel

navig

To_S13

book

ticket

navig

map

hotel

OE2- I think transport is a major cause of the environmental problem, because

To_L21

emiss

gase

carbon

vehicl

road

To_S21

emiss

gase

carbon

vehicl

road

To_L22

pollut

air

sound

traffic

nois

To_S22

pollut

air

traffic

nois

sound

To_L23

vehicl

fuel

transport

exhaust

public

To_S23

vehicl

fuel

transport

public

increas

OE3- The society will benefit from the use of Autonomous Vehicles, as it will

To_L31

time

save

traffic

product

resourc

To_S31

time

traffic

save

product

safe

To_L32

drive

transport

travel

help

peopl

To_S32

drive

transport

travel

help

peopl

To_L33

reduc

accid

pollut

less

human

To_S33

reduc

accid

pollut

less

human

To_L34

effici

road

increas

driver

safeti

To_S34

effici

road

increas

driver

safeti

OE4- Autonomous Vehicles are likely to impact society negatively, as

To_L41

vehicl

may

accid

lead

hack

To_S41

vehicl

may

accid

lead

increas

To_L42

job

driver

loss

peopl

drive

To_S42

job

driver

loss

peopl

drive

To_L43

human

car

control

traffic

reduc

To_S43

car

control

system

technolog

human

Words in both LDA and sLDA for the same topic in the same order

Words in both LDA and sLDA for the same topic in a different order

Using the responses to OE1 from users, we extracted three different topics (using smartphones

for travel-related needs). The first was related to finding places of interest and the navigation

to the identified place (To_L11/To_S11). Another smartphone use was for travel planning and

finding the status (location and traffic updates) of transport modes (To_L12/To_S12). Lastly,

the third use (To_L13/To_S13) was related to finding hotels and making reservations, flight

tickets and taxi services.

P a g e | 51

The different perspectives of individuals related to the environmental impact of transportation

can broadly be classified into three. The first topic (To_L21/To_S21) was related to pollution

(air and noise) and the contribution of transportation induced pollution to global warming and

was driven primarily by the increasing reliance on personal vehicles due to the lack of public

transportation. Second, To_L22/To_S22 also discussed the role of air and noise pollution but

emphasised the wastage of natural resources. Finally, To_L23/To_S23 was related to

increasing air pollution and dependence on fossil fuels.

We categorise the potential benefits of AVs to society into four. First, respondents considered

AVs futuristic and discussed savings on travel time and resources (To_L31/To_S31). Second,

individuals opined that the use of AVs would probably improve public transport and eliminate

the stress of driving (To_L32/To_S32). Third, AVs may reduce accidents due to human errors

while minimising pollution (To_L33/To_S33). Finally, increased road efficiency, safety and

reduction in fuel usage was the fourth benefit (To_L34/To_S34).

There are three significant concerns about the use of AVs, shared by the respondents. First,

individuals believe that AVs may increase accidents as it is prone to system errors and software

hacking (To_L41/To_S41). Furthermore, individuals share concerns over employment loss,

particularly drivers (To_L42/To_S42). Finally, the third topic (To_L43/To_S43) discussed the

technological needs for such a control system and its associated safety.

To analyse if the extracted topics are distinct, we computed the inter-topic distance. The results

are presented visually using pyLDAvis in Figure 3.8. There is no overlap between the extracted

topics, which has the positive side-effect of reducing multicollinearity.

P a g e | 52

Figure 3.8 Inter-topic Distance for a. OE1, b. OE2, c. OE3, d. OE4

3.5.3 Comparison of Closed- and Open-ended Responses

This section analysed if we could find a correspondence between the open- and closed-ended

responses. In three out of the four open-ended questions used in this research, we found

similarities between the extracted topics and the statements used for the Likert scales. For the

questions related to transportation induced environmental problems, the topics extracted from

open-ended questions did not identify the need for colossal infrastructure but identified other

aspects used in Likert scale questions. Infrastructure improvements may be considered

necessary for mobility, which could probably be why people do not have identified it as an

environmental issue. However, open-ended responses also discussed the link between

P a g e | 53

transportation and global warming. The responses mostly covered increased traffic efficiency,

fuel efficiency, travel time savings, reduced pollution, and accidents related to AVs' potential

benefits. The respondents did not discuss the potential of AVs to reduce demand for parking

spaces or address gender equity issues related to mobility. Respondents answering the open-

ended question related to the potential impacts of AVs on society did cover some of the genuine

concerns related to its implementation, such as loss of employment opportunities, the potential

for system failures and hacking and the need for a costly and sophisticated control system.

The open-ended question on the use of smartphones for travel needs “OE1” illustrates the need

for a careful design of open-ended questions. The closed-ended question focussed on the

frequency of use of smartphones for various travel needs; however, our open-ended question

was targeted at identifying the uses. As a result, there is no correspondence and hence cannot

be used to estimate the models. However, we would like to emphasize that we could extract

meaningful topics from the responses to this question.

3.6 MODELLING FRAMEWORK

In the current research, we asked respondents to express their intention to use shared AVs on a

five-point Likert scale. Considering the ordinal nature of the dependent variable, we used an

Ordered Probit model for the estimation, which assumes that there is a continuous underlying

distribution that determines the ratings made by the respondent. In the estimation, we model

latent unobserved dependent variable (󰇜 using the independent variables (xi).

󰆒

By regressing on  with the independent variables, thresholds for the unobserved dependent

variable for the various levels of the observed dependent variable () are estimated. The latent

variable() is related to the observed dependent variable () through: -











P a g e | 54

For a detailed description of the underlying principles and the estimation procedure, we

encourage readers to refer to Greene and Hensher [210]. We used 90% of the data for the

estimation (training data) and 10% for testing (test data). Having estimated the models, we

undertake a careful analysis of the direction, magnitude, and statistical significance of the

estimated coefficients. In addition to this, to evaluate the performance of the estimated model,

we relied on the values of log-likelihood, McFadden pseudo R2 (ρ2), adjusted McFadden

pseudo R2 (adj. ρ2), count R2 and adjusted count R2.

3.7 ESTIMATION RESULTS AND DISCUSSION

The discussion in this section pertains to the third objective, “to compare the relative

performance of the open-ended and closed-ended responses in analysing qualitative data.” For

both versions of the questionnaire, we estimate a model “MI” using variables related to the

Theory of Planned Behaviour only. Later, for each version of the questionnaire, we included

variables related to the environmental effects of transportation. Ver_Lk introduces the variable

“Role_Pol” and the corresponding model being named “MF”. For Ver_OE, we estimated two

models by including the topics related to the environmental effects of transportation induced

pollution- “MLDA” (using LDA) and “MSLDA” (using sLDA). Notice that we only have one topic

variable, as only this variable was statistically significant, and we discuss this in detail in the

subsequent paragraph. Finally, we compare the relative performance of each of these models

using a training set (90%) and a test set (10%).

Table 3.5 presents the estimation results for the training set for the intention to use shared AVs.

We present the threshold parameters for the Ordered Probit model along with initial log-

likelihood (LLI), final log-likelihood (LLF) and McFadden pseudo-R-squared value (ρ2). MI

had a ρ2 value of 0.16, and there was no improvement with the addition of the environmental

pollution-related variable (MF) for Ver_Lk. However, the performance of the model was

superior for Ver_OE of the questionnaire. MI for Ver_OE had a ρ2 value of 0.227, which

improved to 0.242 and 0.245 for MLDA and MSLDA, respectively. A similar improvement can be

observed for other goodness-of-fit measures also. We should note that while computing the

adjusted values for ρ2 and Count R2, we did not account for the LDA and sLDA model

parameters. Questions related to the pollution asked in Ver_Lk of the questionnaire did not

improve the model's performance and could be because of the minimal variability in response

to the Likert scale question. However, in Ver_OE, there was an improvement with the addition

of the topic variable related to pollution, and this could be because this variable

P a g e | 55

(To_L23/To_S23) extracted more relevant information than the Likert scale question pollution.

It also captured the underlying factor- increase in vehicles, lack of public transport.

Table 3.5 Estimation Results of the Model for the “Intention to Use Shared AVs”

Variable

Ver_Lk

Ver_OE

MLDA

MSLDA

Constant

2.568***

2.615***

2.683***

2.612***

2.608***

Positive attitude towards AVs

0.167

0.170

0.446***

0.472***

0.477***

Subjective norms

0.369***

0.374***

0.273***

0.217

0.209

Perceived behavioural control variables 1

0.388***

0.381***

0.432***

0.440***

Perceived behavioural control variables 2

0.087

-0.182*

-0.212**

-0.217**

Role of pollution

-0.065

Role of pollution To_L23/ To_S23

1.799**

1.979***

Threshold parameters for the index

Mu(01)

0.873***

0.872***

0.833***

0.865***

0.872***

Mu(02)

1.906***

1.903***

1.839***

1.900***

1.915***

Mu(03)

3.771***

3.769***

3.773***

3.872***

3.896***

Goodness-of-fit measures

Sample size

183

145

LLI

-222.955

-181.119

LLF

-187.196

-187.143

-140.063

-137.331

-136.823

ρ2

0.160

0.161

0.227

0.242

0.245

Adj. ρ2

0.125

0.120

0.183

0.192

0.195

Count R2

0.552

0.579

0.600

0.607

Adj. Count R2

0.172

0.218

0.247

0.250

F1 Score

0.389

0.539

0.497

0.534

Note. ***, **, * ==> Significance at 1%, 5%, 10% level

For both versions of the questionnaire, the intention to use shared AVs is associated positively

with the individuals’ positive attitudes on AVs. The coefficients are, however, statistically

significant only for Ver_OE of the questionnaire. The positive perception among friends and

family and their supportive nature also contributes positively to the intention to use shared

AVs. The coefficient for subjective norms loses its statistical significance for MLDA and MSLDA

in the 90% training set. However, the variable is statistically significant for the entire dataset

(100% observations), and hence, we suspect this likely loss of statistical significance in the

training set (for MLDA and MSLDA) to be due to the low sample size (145 observations). The

factor related to the confidence in the system has a positive and statistically significant

coefficient for both versions of the questionnaire. It emphasises the need to improve the

confidence that the system is protected against hacking and failures and that the interaction

with other vehicles is safe. The factor capturing the concerns associated with the use of AVs

such as affordability, liabilities in the event of an accident and payment for the service

P a g e | 56

influences the intention to use shared AVs in Ver_OE negatively. The factor has, however, a

positive and statistically insignificant coefficient for Ver_Lk of the questionnaire. When

comparing the coefficients, the factor associated with the positive benefits of AVs and the

concerns with AVs is not statistically significant for Ver_Lk, but it is significant for Ver_OE.

The only difference between the two versions is the introduction of open-ended questions

related to AVs’ potential benefits and issues. Answering these questions would have demanded

additional thinking from the respondents and could explain the difference in the coefficients.

Moreover, this might have eliminated the careless response to Likert scale questions.

As reported earlier, we see an improvement in the perceptions towards AVs and their use

among respondents answering Ver_OE of the questionnaire. It could be because answering the

open-ended questions could have made them provide answers based on more deliberative

reasoning. Respondents who believe that transportation is a significant source of environmental

pollution are more likely to use shared AVs, probably because of the notion that shared systems

are sustainable. However, we do not observe this effect for Ver_Lk of the questionnaire, in

which we used the Likert scale question. We also tested AVs’ benefits on society (using a

factor for Ver_Lk and topics for Ver_OE) on the intention to use shared AVs. The variables

were not statistically significant, probably because respondents are more focused on the

potential benefits to the individual and not to society.

To test the predictive capability of the estimated models, we applied the models to predict the

intention to use shared AVs using a test set with 10% randomly selected observations. We

tabulate the performance of the goodness-of-fit measures for the test sets in Table 3.6.

Table 3.6 Comparison of the Goodness-of-fit Measures for the Test Sets

Ver_Lk

Ver_OE

MLDA

MsLDA

LLI

-28.970

-28.97

LLF

-21.001

-20.883

-19.151

-19.245

-19.271

ρ2

0.275

0.279

0.339

0.336

0.335

Adj ρ2

-0.001

-0.032

0.063

0.025

0.024

Count R2

0.500

0.786

Adj Count R2

0.100

0.250

F1 Score

0.244

0.369

0.273

0.369

The performance of the models is better for Ver_OE in comparison to Ver_Lk. A similar trend

can be observed for the test set also. When computing the adjusted ρ2, we obtained negative

P a g e | 57

values for the test set for Ver_Lk and is because, while accounting for the number of estimated

parameters, the final log-likelihood was less than the initial log-likelihood. To account for

precision and recall, we computed the F1 scores. These scores are also superior for Ver_OE of

the questionnaire for both the training set and test set. The supervised approach (sLDA) did not

improve the model's performance compared to the unsupervised approach (LDA).

3.8 CONCLUSION

To extract information from the open-ended responses, we used LDA and sLDA. The inter-

topic distance between the extracted topics indicated that the extracted topics were distinct. We

could find a correspondence to a certain extent with the Likert scale questions for most open-

ended questions. The inclusion of open-ended response related to transport-induced pollution

had a positive influence on the intention to use shared AVs, which also had a corresponding

influence on the goodness-of-fit of the estimated models. The attitudinal variables in Ver_Lk

corresponding to this open-ended question did not turn out to be statistically significant. The

models estimated using Ver_OE of the questionnaire outperformed the models estimated using

Ver_Lk of the questionnaire for both the training set and the test set. We could not see an

improvement in the models’ performance with the use of sLDA over LDA. These results and

the potential to alter the reasoning process emphasize using open-ended responses to measure

attitudes and Topic Models for extracting open-ended responses.

An individuals’ positive attitude towards using AVs and subjective norms positively influences

the intention to use shared AVs for Ver_OE of the questionnaire. The coefficient was

statistically significant for the entire dataset; however, the coefficient lost its statistical

significance for the training set comprising 90% responses. And this, we believe, could be

related to the low sample size. Perceived behavioural control variable related to having a

conducive environment positively influenced the intention to use shared AVs, whereas

perceived behavioural control variable related to concerns negatively influenced the intention

to use shared AVs. This effect is observed only for Ver_OE of the questionnaire. The results

indicate the ability of TPB to measure the intention to use shared AVs.

Having demonstrated the potential to use open-ended questions to measure attitudes, we

consider it advisable to investigate this further using a larger dataset. We believe the use of a

larger dataset might improve the quality of the extracted topics. Being an online survey, where

participation was primarily voluntary, we could not ensure the samples’ representativeness.

Our data had a higher proportion of males, who were primarily young or middle-aged.

P a g e | 58

Considering this, further analysis using a sample that is representative of the population is

imperative. Furthermore, it would be appropriate to use statements in the Likert scale questions

that tested extensively to compare the performance with open-ended questions. Regarding the

use of Topic Models, a possibility could be to avoid the splitting of noun-noun compounds

(public transportation, global warming) in the data.

P a g e  | 59 
 
4 INTEGRATING AND COMPARING OPEN- AND CLOSED-
ENDED RESPONSES: A CASE STUDY ON AVS FOR COMMUTE TRIPS 
4.1 INTRODUCTION 
To address some of the limitations of the study carried out in India, we developed a second 
study that analysed the mode choice for commute trips in the USA, in a scenario  offering 
“Regular Cars”, “Personal Autonomous Vehicles”, and “Shared Autonomous Vehicles”. We 
aim  to validate  the three  objectives explored  previously  in Chapter  3 using  a  larger and 
representative dataset. In addition to this, we also pursue the fourth objective, which involves 
developing the framework that measures attitudes by allowing respondents to choose their 
preferred question type. 
We organised this Chapter similar to Chapter 3; Section 4.2 discusses the questionnaire design, 
presenting the experimental design and the theoretical framework used in the study. Section 
4.3 discusses the data collection and the data cleaning procedures, and Section 4.4 discusses 
results from the exploratory analysis of the closed-ended responses. We further explore the 
differences in the frequency distributions of the responses to the closed-ended responses and 
their statistical significance. Section 4.5 presents different aspects of extracting information 
from closed- and open-ended responses, discussing the results from exploratory analysis and 
Topic Modelling. Finally, we present a comparison of the extracted topics with the statements 
used for the closed-ended responses. Next, we carry out  a brief discussion on the adopted 
modelling  framework  in  Section 4.6,  and in  Section 4.7,  we present  the results from  the 
estimation of the model for the intention to use AVs for commute trips. The framework that 
facilitates  researchers/analysts  to  allow  respondents  to  choose  their  preferred  type  of 
questionnaire to answer the questions related to attitudes is presented in Section 4.8. The final 
section (4.9) presents the salient findings and the current studys’ limitations. 
4.2 QUESTIONNAIRE DESIGN 
The responses to our survey are likely to be influenced by the representativeness of the sample, 
sample size, the behavioural framework used, the statements used for the Likert scale and the 
choice experiment- to name a few. And this study aims to disentangle the impacts and narrow 
down the differences in attitudes to the questionnaire types- closed- and open-ended questions. 
In this regard, we collect responses from a sample that is representative of the population. 
Furthermore, to overcome the potential influence of the framework and statements, we used 

P a g e | 60

the framework (Zhang et al. [168]) and stated-preference experiments (Haboucha et al. [40])

tested previously by other researchers.

4.2.1 Experimental Design

The questionnaire for this study comprises two parts. The first part measures the attitudes, and

the second part captures the choices by the individuals. We adopted two different experimental

designs (the first to test the adequacy of open-ended questions and the second for the choice

experiment) and discuss them in the following paragraphs.

In the first part that measures attitudes, we present questions on the perceived ease of use,

perceived usefulness, perceived safety risk, perceived privacy risk, trust, and attitudes towards

the use of AVs to the individuals. To address the first objective of this research- to analyse the

influence of the questionnaire type on the responses, we use two versions of the questionnaire,

Ver_LK and Ver_LKOE. To pursue the remaining objectives, we introduce the third version

of the questionnaire, Ver_OE. The three versions of the questionnaire are as follows: -

• Ver_Lk- attitudes are measured using statements depicting the attitude, and the

responses are collected using a five-point Likert scale.

• Ver_LkOE- uses a combination of open- and closed-ended questions; we, however,

exclude the responses to the open-ended question from the analysis, as the objective is

to analyse if the open-ended question alters the responses to the closed-ended questions.

The closed-ended questions are the same as in the Ver_Lk, and we use them for

predicting the behaviour.

• Ver_OE- In the third version, only open-ended questions are used to measure attitudes.

We present the experimental design in Figure 4.1 below. Each version is presented randomly

to the respondents using the “Randomizer” feature in Qualtrics, thereby eliminating the

potential for any bias in the data collection.

The second part of the experimental design is related to the mode choice experiment, later used

as a dependent variable for the estimation. We hypothesise that the measured attitudes

influence the dependent variable captured using a set of Stated Preference (SP) questions

presented to the respondents, and individuals choose between the three alternatives, “Regular

Cars”, “Personal Autonomous Vehicles”, and “Shared Autonomous Vehicles”. The choice set

for the SP experiment is designed following the study by Haboucha, Ishaq and Shiftan [40].

The purchase cost for the vehicle, yearly subscription cost, average travel cost per trip and

P a g e | 61

parking cost defines the scenarios presented to the respondents. Table 4.1 presents the

experimental design for the “Stated Preference” experiment adapted from Haboucha, Ishaq and

Shiftan [40].

Figure 4.1 Experimental Design for the Mode Choice for Commute Trips

Table 4.1 SP Experimental Design

Regular

Personal Autonomous Vehicle

Shared Autonomous Vehicle

Vehicle cost

100%

80%

0$ *

100%

150$ *

115%

300$ *

130%

2000$ *

Trip cost

100%

70%

85%

150%

100%

210%

120%

300%

Parking cost

100%

130% + 5$

30%

140% + 5$

60%

150% + 5$

100%

* Yearly costs for membership

Haboucha, Ishaq and Shiftan [41] generated sixteen orthogonal choice scenarios and presented

six choice scenarios to each respondent using these values. Since our priority was to identify

and quantify improvements using open-ended responses to measure attitudes, we used this

experimental design (the different scenarios are presented in Appendix C). In addition, we

carried out a pilot test to check the questionnaire for inconsistencies, which gave us insights

Ver_LKOE

Perceived Ease of Use

Perceived Usefulness

Perceived Safety Risk

Perceived Privacy Risk

Trust

Attitudes

Perceived Ease of Use

Perceived Usefulness

Perceived Safety Risk

Perceived Privacy Risk

Trust

Attitudes

Open-ended questions

Perceived Ease of Use

Perceived Usefulness

Perceived Safety Risk

Perceived Privacy Risk

Trust

Attitudes

Closed-ended Questions

Open-ended Questions

Ver_LK

Ver_OE

P a g e | 62

into the average time for completion. Furthermore, the pilot test also provided information on

questions that lacked clarity, and we used this to revise the questionnaire before launching the

final survey.

4.2.2 Framework Design

The original framework of TAM assumes that the perceived usefulness and the ease of use

influence the use of technology. Perceived usefulness (PU) captures the users’ belief that using

this technology improves their performance in the organisational context, while perceived ease

of use (PEoU) captures the degree to which the user expects the technology to free their effort.

In this study, we used an extension of the Technology Acceptance Model (TAM), proposed by

Zhang et al. [168]; however, in their study, researchers predict the behavioural intention to use

AVs using a set of Likert scale questions. Our research extends this framework further using a

choice experiment adopted from Haboucha, Ishaq and Shiftan [40]. We present the modified

framework below in Figure 4.2 and the questionnaire in Appendix C.

Figure 4.2 Proposed Framework for the Mode Choice for Commute Trips

4.3 DATA COLLECTION AND DATA CLEANING

We launched this survey in the USA between January and March 2020, and to ensure a speedy

data collection and the representativeness of the collected sample, we used survey panels

provided by Cint. To check for inconsistencies and identify problematic records, we used the

Perceived Ease of Use

Perceived Safety Risk

Perceived Privacy Risk

Perceived Usefulness

Initial Trust

Attitudes

Utility

Socio-economic Characteristics

Travel Characteristics

Choice in SP Experiment

P a g e | 63

approach outlined in Section 3.3. For the open-ended responses, we omitted respondents who

used less than seven words to answer. Based on the recommendations from Cint, we removed

such responses after data collection, as we feared that enforcing conditions while answering a

questionnaire might alter their actual response. After each wave of data collection, we analysed

to identify problematic records from the respondents, which Cint replaced. The final dataset

comprises 3002 complete responses (Ver_Lk- 1012, Ver_LkOE- 1021, Ver_OE- 969). The

sample used was representative of the US population based on gender, ethnicity, and regional

diversity.

4.4 EXPLORATORY ANALYSIS

In this section, we focus on the first objective of this research, “to analyse if the method of

collecting qualitative data influences the survey responses.” To facilitate this comparison, we

first analyse if the collected samples are almost equal in terms of the socio-demographic

characteristics of the individuals. Then, having established that the samples are almost similar,

we pursue analysis related to the first objective. In this regard, we first analyse the frequency

distributions before carrying out an in-depth analysis of the statistical significance of the

differences and the impact of the questionnaire type.

4.4.1 Preliminary Analysis on Socio-demographic Characteristics

Table 4.2 presents a comparison of the socio-demographic characteristics of the participants.

Nearly 53% of respondents were female, and the average age was ~40 years old. Ver_LKOE

and Ver_OE of the questionnaire had a slightly higher representation of respondents earning

between $50,000 and $74,999 and possessing a bachelors/graduate degree. Nearly 70% of the

respondents were whites, and 14-18% were African American respondents. Moreover,

approximately 55% of the participants were employed full-time, 30% employed part-time, and

Ver_OE had a slightly higher representation of students. The average number of adults per

household was around 2.2 (standard deviation- 1.00). Unlike the previous study from India,

only about 5% of the respondents did not disclose their household income.

Table 4.2 Socio-Demographic Characteristics

Variable

Levels

Frequency

Ver_LK

Ver_LKOE

Ver_OE

Gender (%)

Female

52.87

53.18

53.66

Male

46.74

46.43

46.13

Prefer not to answer

0.4

0.39

0.21

Average age (in years)

40.9 (13.31)

38.61 (14.23)

36.55 (14.19)

P a g e | 64

Household Income

(in $)

Less than 10,000

9.39

7.74

5.68

10,000 - 14,999

5.14

3.82

2.89

15,000 - 24,999

10.87

8.81

7.84

25,000 - 34,999

12.85

12.54

12.69

35,000 - 49,999

15.51

15.38

15.48

50,000 - 74,999

17.19

19.78

20.64

75,000 - 99,999

12.06

12.14

13.93

100,000 - 124,999

5.24

6.56

7.33

125,000 - 149,999

2.47

3.53

3.10

150,000 - 199,999

2.47

2.84

2.27

More than 200,000

1.88

1.96

2.68

Prefer not to answer

4.94

4.9

5.47

Educational

Qualification (%)

Less than high school graduate

6.23

5.48

6.91

High school graduate or GED

28.85

23.41

20.33

Some college or associate degree

38.24

38.69

36.95

Bachelor's degree

17.29

21.94

23.74

Graduate degree or professional degree

(Master or PhD)

8.2

9.79

11.76

I prefer not to answer

1.19

0.69

0.31

Ethnicity (%)

White

69.37

70.71

69.45

Black or African American

17.39

14.99

13.83

American Indian or Alaska Native

1.19

1.47

1.86

Asian

4.05

3.33

5.06

Native Hawaiian or other Pacific Islander

0.59

0.49

0.72

Some other race

6.62

8.23

7.53

I prefer not to answer

0.79

0.78

1.55

Employment

Status (%)

Full-time

55.93

52.01

53.46

Part-time

30.24

31.73

27.55

Student

13.83

16.26

18.99

Number of adults

2.23 (0.99)

2.24 (1.00)

2.27 (1.00)

Number of children aged between 8 and 17

0.62 (0.97)

0.57 (0.97)

0.50 (0.92)

Number of children aged under 8

0.38 (0.78)

0.31 (0.71)

0.29 (0.64)

In the subsequent paragraphs, we present a discussion on the travel characteristics of

individuals (refer to Table 4.3). In their previous week, approximately 30% of respondents

travelled less than 35 miles, ~20% travelled between 35 and 70 miles, while more than 30%

travelled more than 105 miles. The average commute distance was about 13 miles (Ver_LK-

14.55, Ver_LKOE-12.99 and Ver_OE- 13.81), and the average commute time was

approximately 24 minutes (Ver_LK- 24.42, Ver_LKOE-24.09 and Ver_OE- 24.54). An

overwhelming proportion (90%) of respondents performed errands at least once a week during

their commute to work or mid-day. On average, car-users paid less than $2 towards parking

charges; however, non-car-users reported the estimated average parking charges to be ~$8.

This difference between the actual values and the estimate of parking charges is probably

P a g e | 65

because most car-users had access to free parking, which the non-car-users might have

overlooked. The average occupancy in a car was 1.47. Among the car-users, nearly 50%

considered it essential to leave items in the car. For all three versions of the questionnaire, the

disposable amount for a new car averaged at ~$25,000. Interestingly, nearly 30% of

respondents did not drive a car with Adaptive Cruise Control (ACC), while ~40% reported

using it in the last 12 months. 77% of participants answering Ver_LK of the questionnaire were

aware of AVs, while the percentage was more than 85 among those answering Ver_LKOE and

Ver_OE. When asked if they had ever ridden an AV, less than 10 % were affirmative.

Table 4.3 Travel Characteristics of the Individuals (I)

Variable

Levels

Frequency

Ver_LK

Ver_LKOE

Ver_OE

Total vehicle miles

travelled by all

modes in the past 7

days (%)

Less than 35.0 miles

33.10

29.97

28.78

35.0 - 70.0 miles

19.47

18.22

17.96

70.0 - 105.0 miles

11.56

14.96

More than 105.0 miles

30.63

36.73

36.02

I don't know, or I prefer not to answer

5.24

3.53

2.27

Commute distance (in miles)

14.55 (14.9)

12.99 (13.18)

13.81 (13.31)

Commute time (in minutes)

24.42 (19.55)

24.09 (20.25)

24.54 (18.70)

Frequency of

errand during the

day (%)

Daily

17.98

16.06

13.42

3 - 4 times a week

24.60

25.76

25.70

1 - 2 times a week

31.82

32.71

36.33

Less than once a week

13.93

16.65

16.92

Never

11.66

8.81

7.64

Average parking charges (in $)

1.29 (7.17)

1.98 (9.70)

1.41 (5.69)

Average parking charges (non car-users) (in $)

5.42 (6.61)

10.48 (14.05)

8.36 (7.04)

Number of individuals in car

1.47 (0.81)

1.47 (0.80)

1.45 (0.76)

Importance of the

ability to leave

items in your car

(%)

Very important

12.94

11.26

11.04

Somewhat important

30.93

32.62

32.51

Not important

41.01

43.68

45.51

Non car users

15.12

12.44

10.94

Average Cost of Car (in $)

25129.68

(16260.59)

25886.82

(14748.11)

25919.13

(12872.37)

Frequency of use of

Adaptive Cruise

Control when

driving in the last

12 months (%)

Very frequently

6.53

6.56

4.54

Frequently

6.43

7.44

8.67

Occasionally

12.27

12.63

13.31

Rarely

9.00

7.44

7.64

Very rarely

7.02

6.07

6.09

Never

24.23

22.04

18.78

I don't know about ACC

8.51

7.93

8.46

I don't drive a car with ACC

26.01

29.87

32.51

Heard of AVs (%)

22.23

14.20

11.15

Yes

77.77

85.80

88.85

P a g e | 66

Ever ridden a full

AV (%)

91.8

92.26

92.05

Yes

8.20

7.74

7.95

Table 4.4 presents the travel characteristics, particularly related to the mode choice of

respondents for commute trips. While about 50% reported having never walked, ~75% never

used a bike. The corresponding percentages for motorbikes and car-sharing were ~91% and

~80%, respectively. Unsurprisingly, the most used mode for commute was the car (~95%).

When asked about taxi and public transport use frequency, ~60% of participants reported

having never used them for the commute.

Table 4.4 Travel Characteristics of an Individual (II)

Variable

Levels

Frequency

The mode used for Commute trips is

Ver_LK

Ver_LKOE

Ver_OE

Walk (%)

Every day

21.84

17.34

16.00

Several times a week

14.03

15.87

14.86

Several times a month

9.49

8.42

10.53

Several times a year

10.28

10.38

12.18

Never

44.37

47.99

46.44

Bike (%)

Every day

1.09

1.27

0.83

Several times a week

4.64

3.43

3.83

Several times a month

6.82

6.17

5.60

Several times a year

13.14

12.44

13.37

Never

74.31

76.69

76.37

Motorbike (%)

Every day

0.49

0.39

0.21

Several times a week

2.08

1.76

1.14

Several times a month

2.08

2.25

1.14

Several times a year

4.45

3.82

4.33

Never

90.91

91.77

93.19

Car (%)

Every day

67.00

67.19

70.07

Several times a week

17.89

20.37

18.99

Several times a month

5.24

5.09

5.37

Several times a year

2.96

3.13

2.48

Never

6.92

4.21

3.10

Car sharing (%)

Every day

0.79

1.67

0.93

Several times a week

3.66

2.64

3.72

Several times a month

5.34

4.80

4.54

Several times a year

8.10

11.46

9.29

Never

82.11

79.43

81.53

Taxi (%)

Every day

0.69

0.88

0.41

Several times a week

5.24

3.43

4.23

Several times a month

10.28

11.26

12.07

Several times a year

21.34

22.82

26.42

Never

62.45

61.61

56.86

Every day

4.45

4.80

4.75

P a g e | 67

Public Transport

(%)

Several times a week

8.99

6.37

7.74

Several times a month

7.41

7.93

7.22

Several times a year

16.80

17.04

21.05

Never

62.35

63.86

59.24

4.4.1.1 Perceived Ease of Use

Figure 4.3 Frequency Distribution for Perceived Ease of Use

This section of the questionnaire measured the perceptions of the ease of using AVs. Referring

to the frequency distribution presented in Figure 4.3, almost half of the participants believed it

would be easy for them to learn how to use AVs. Of the remaining, about 30% were neutral to

the statement. However, when asked if they would find it easy to get AVs to do what they want

them to do, the majority (~40%) were unsure about it, with another 30% of respondents

agreeing to the statement and a further 10% strongly agreeing to it. Thus, in general, people

are more confident that they could become skilful at using AVs or that AVs might be easy to

use.

4.4.1.2 Perceived Usefulness

This extension of the Technology Acceptance Model (TAM) emphasises the importance of the

perceived usefulness in the formation of attitudes. We used five statements, a. using

autonomous vehicles will be useful in meeting my travel needs, b. autonomous vehicles will let

me do other tasks, such as eat, watch a movie, be on a cell phone on my trip, c. using

P a g e | 68

autonomous vehicles will decrease my accident risk, d. using autonomous vehicles will relieve

my stress of driving and e. I find autonomous vehicles to be useful when I’m impaired (e.g.,

drowsy, drunk, drugs).

Figure 4.4 Frequency Distribution for Perceived Usefulness

Referring to Figure 4.4, we can observe that most respondents assumed that AVs might be

useful in meeting travel needs, perform other activities during travel, and travel when impaired

from driving (when under the influence of alcohol or drowsy). A significantly higher

proportion of participants responded neutrally to whether AVs might reduce the number of

accidents and are slightly more pessimistic about this proposition. However, among those

answering Ver_LKOE, there were fewer neutral respondents, and the respondents were

primarily optimistic. On the statement on whether AVs might reduce the stress of driving,

participants were mostly positive; however, we still have several sceptics.

4.4.1.3 Perceived Safety Risk

Interestingly, as Figure 4.5 illustrates, an overwhelming proportion of respondents agreed with

the statements for this set of questions. Almost three in every four respondents are worried

P a g e | 69

about the general safety of such technology. They were worried that the failure or malfunction

of AVs might cause accidents. It is worth noting that the proportion of respondents agreeing

strongly to the statements has increased significantly for the Ver_LKOE of the questionnaire.

Figure 4.5 Frequency Distribution for Perceived Safety Risk of AVs

4.4.1.4 Perceived Privacy Risk

Figure 4.6 Frequency Distribution for Perceived Privacy Risk of AVs

When analysing the frequency distribution for the responses on the concerns regarding the

privacy risks posed by such technology, the pattern was bell-shaped (Figure 4.6). There was a

very high proportion of neutral respondents for all three questions, with almost an equal

proportion of respondents agreeing/disagreeing with the statements. Interestingly, most

respondents were okay with AVs collecting personal information. However, using and sharing

the collected personal information without their consent was a big concern.

P a g e | 70

4.4.1.5 Trust

When investigating the level of trust of the technology among people, people, in general, are

mostly neutral to the technology (refer to Figure 4.7). Only about 25% consider them

dependable or reliable. From the perspective of a transportation planner, this requires

immediate intervention. Overall, the proportion of respondents trusting AVs was nearly 30%

among those answering Ver_LK and approximately 40% for respondents answering

Ver_LKOE.

Figure 4.7 Frequency Distribution for Trust in AVs

4.4.1.6 Attitudes

Figure 4.8 presents the variation of the responses to the questions related to overall attitudes.

In contrast to the respondents answering Ver_LK of the questionnaire, an additional 10% of

respondents answering Ver_LKOE considered AVs a good and wise idea. When asked if the

participants considered AVs to be pleasant, most of the responses were neutral. Thus, the

overall attitudes towards the use of AVs are very positive among the public.

P a g e | 71

Figure 4.8 Frequency Distribution for Attitudes towards AVs

4.4.1.7 Modal Share for Commute Trips

Figure 4.9 Frequency Distribution for the Mode for Commute Trips

The questionnaire also investigated the likelihood of respondents choosing a “Regular Car”,

“Private AV”, or “Shared AV” for the commute trips. We accomplished this with the help of a

personalised stated-preference survey that used the travel characteristics of the respondents

reported in the previous sections of the questionnaire. Consistent with the claims presented in

the previous sections of the questionnaire, we observe a difference in the frequency

distributions for the alternative questionnaire types. Among those answering Ver_LK of the

questionnaire, nearly 50% of participants preferred “Regular Car.” About 30% of respondents

preferred “Private AV”, and the rest opted for “Shared AV”. The distribution of shares was

different for Ver_LKOE, and the distribution was similar to that observed for Ver_OE of the

Regular Car Private AV Shared AV

Modal Share for Commute Trips

Ver_LK VerLKOE Ver_OE

P a g e | 72

questionnaire. Approximately 40% of the respondents chose “Regular Car”, ~34% chose

“Private AV”, while the remaining (~25%) chose to use “Shared AV” for their commute trips.

Figure 4.9 compares the breakdown of mode shares based on the different questionnaire types.

4.4.2 Statistical Analysis

After identifying the differences in the frequency distributions using open-ended questions

before the Likert scale responses, it is quintessential to assess if these differences are

statistically significant. For this, we performed the non-parametric test Mann-Whitney U Test

[207], and in the second column of Table 4.5, we tabulate the results (p-values) of the analysis.

Except for three, the differences in the distributions were statistically significant (95%

confidence interval) for all statements measuring attitudes. However, for statements, “I will

find it easy to get Autonomous Vehicles to do what I want them to do”, “Using Autonomous

Vehicles will be useful in meeting my travel needs”, and “Autonomous Vehicles are reliable”,

the previously observed differences were not statistically significant.

4.4.3 In-Depth Analysis

To gain further insights on the influence of questionnaire type on the responses, we estimated

models for each Likert scale question. As described by de Abreu e Silva, Papaix and Chen

[211], as explanatory variables in the specification, we included those depicting the socio-

demographic characteristics of the individual (age, gender, ethnicity, household income)

questionnaire type. Considering the ordinal nature of the dependent variable, we used ordered

Probit models for each of the dependent variables. Greene and Hensher [210] present a detailed

description of the underlying principle and the estimation of the model. Referring to the third

column of Table 4.5, the coefficient for the questionnaire type was statistically significant for

at least 12 (60%) of the statements. The results highlight the potential of open-ended questions

influencing the responses to Likert scale questions.

Table 4.5 Results of the Statistical Analysis on Whether Open-ended Questions Influence

Responses to Likert Scale Questions

Statements Depicting Attitudes

Mann-Whitney

U Test

Coeff

Perceived Ease of Use

Learning to use Autonomous Vehicles will be easy for me

0.013**

0.076

I will find it easy to get Autonomous Vehicles to do what I want them to do

0.598

-0.020

It will be easy for me to become skillful at using Autonomous Vehicles

0.003***

0.104**

I will find Autonomous Vehicles easy to use

0.042**

0.058

Perceived Usefulness

P a g e | 73

Using Autonomous Vehicles will be useful in meeting my travel needs

0.085*

0.035

Autonomous Vehicles will let us do other tasks such as eating, watching a

movie, be on a cell phone during my trip

0.000***

0.162***

Using Autonomous Vehicles will decrease my accident risk

0.000***

0.153***

Using Autonomous Vehicles will relieve my stress of driving

0.006***

0.106**

I find Autonomous Vehicles to be useful when I'm impaired

0.001***

0.143***

Perceived Safety Risk

I'm worried about the general safety of such technology

0.014**

0.106**

I'm worried that the failure or malfunction of Autonomous Vehicles may

cause accidents

0.001***

0.148***

Perceived Privacy Risk

I'm concerned that Autonomous Vehicles will collect too much personal

information from me

0.000**

-0.180***

I'm concerned that Autonomous Vehicles will use my personal information for

other purposes without my authorization

0.002***

-0.136***

I'm concerned that Autonomous Vehicles will share my personal information

for other purposes without my authorization

0.006***

-0.110**

Perceived Trust

Autonomous Vehicles are dependable

0.048**

0.051

Autonomous Vehicles are reliable

0.133

0.039

Overall, I can trust Autonomous Vehicles

0.043**

0.056

Attitudes

Using Autonomous Vehicles is a good idea

0.001***

0.115**

Using Autonomous Vehicles is a wise idea

0.002***

0.110**

Using Autonomous Vehicles is pleasant

0.036**

0.042

Note: ***, **, * ==> Significance at 1%, 5%, 10% level.

4.5 EXTRACTION OF DATA

The emphasis in this section is on the second objective, “to develop an approach to extract

open-ended responses from a survey and process the data”.

4.5.1 Treatment of Closed-ended Responses

To assess the internal reliability of the Likert scale responses, we used Cronbachs’ Alpha

values. In general, relatively high values were obtained for both versions of the questionnaire

but observed higher reliability for Ver_LKOE. In Table 4.6, we present the values for internal

reliability.

Table 4.6 Internal Reliability- Cronbach’s Alpha

Construct

Ver_LK

Ver_LKOE

Perceived ease of use

0.905

0.908

Perceived usefulness

0.830

0.841

Perceived safety risk

0.839

0.863

Perceived privacy risk

0.912

0.918

Trust

0.878

0.907

Attitude

0.891

0.914

P a g e | 74

To test the validity of the questionnaire, we compared the average scores for the Likert scale

responses among two groups (have awareness about AVs v/s unaware about AVs).

Respondents following news about AVs seem to have responded correctly to the statements.

The differences between the two groups were statistically significant (t-stats) for both versions

(“Ver_Lk”- 2.504 and “Ver_LKOE”- 2.638) of the questionnaire, indicating the validity of the

questionnaire.

4.5.2 Extraction of Information from Open-ended Responses

4.5.2.1 Exploratory Analysis

We used an approach similar to that discussed in Section 3.5.2 to deal with the open-ended

responses. We used “Grammarly” to correct mistakes with a generic pattern such as spelling

mistakes and grammatical mistakes. We must emphasise the importance of the efforts required

for data cleaning. We explored data in greater detail to identify phrases using “Regular

Expressions”. For example, phrases “don’t trust” or “do not trust” might convey the opposite

meaning without the words “do”, “not”, and “don’t”. So, we identified such phrases and others

that mean the same and replaced them with “no_trust”. Doing this is one of the first steps in

dimensionality reduction, which we believe is critical for the analysis and should carry out

carefully to ensure that it is free of bias.

Furthermore, in this study, to improve the Topic Models, we included more words (such as

Autonomous, Vehicles, Cars, etc.) that do not convey any contextual meaning in the text

analysis into the list of “Stop Words”. This process was iterative and time-consuming, and at

any given stage, the analyst should decide what the appropriate level of data cleaning is. In

each iteration, we evaluated the frequencies of occurrences of the words and these

combinations, and if the frequency was sparse (less than 10 occurrences), we left the words as

they are.

Contrary to the previous study, there was a significant reduction in the average number of

words per response after cleaning the open-ended responses. On average, there was a drop of

about 65%. Therefore, we present the average number of words per response in Table 4.7.

Interestingly, the responses were the longest for the question related to the “safety concerns”

associated with the use of AVs.

As followed in Section 3.5.2.1, we use the spellings after “stemming” while referring to the

words so that the spellings might vary. For example, to answer the first question related to the

P a g e | 75

easiness in use, respondents used “drive” (4.33%), “use” (3.76%), “easi” (3.09%), “technolog”

(2.30%) and “make” (1.49%). To describe the use of AVs, the most frequently used words

included “drive” (5.38%), “use” (3.48%), “driver” (2.05%), “help” (2.02%) and “accid”

(1.91%). “Drive” (2.33%), “accid” (1.72%), “malfunct” (1.64%), “concern” (1.63%) and

“human” (1.61%) were used to articulate the potential safety concerns associated with the use

of AVs and when discussing the privacy concerns respondents used “privaci” (5.38%),

“concern” (3.35%), “inform” (2.61%), “use” (1.84%) and “issu” (1.66%). In the fifth question,

respondents were asked to describe their reasons to trust or not trust AVs and to answer this,

“trust” (4.82%), “drive” (2.52%), “technolog” (2.36%), “control” (2.10%) and “use” (1.55%)

were used. In describing the general attitudes towards AVs, respondents used the words “drive”

(2.60%), “use” (2.29%), “futur” (2.10%), “technolog” (1.68%) and “need” (1.59%).

Table 4.7 Average Number of Words per Response

Open-ended Questions

Original

Cleaned

Do you think that it will be easy to use Autonomous Vehicles? (OE1)

22.03

7.78

Do you believe that Autonomous Vehicles are useful? (OE2)

20.50

8.07

Do you have safety concerns regarding the use of Autonomous Vehicles? (OE3)

22.42

8.14

Do you have concerns related to privacy associated with the use of Autonomous Vehicles? (OE4)

16.90

5.93

Would you as a use trust an Autonomous Vehicle? (OE5)

18.46

6.45

What are your general opinions about Autonomous Vehicles? (OE6)

20.59

6.97

4.5.2.2 Results from Topic Models

In this research, six open-ended questions were presented to respondents answering Ver_OE

of the questionnaire. Along with the first five open-ended questions, we presented respondents

with an option to agree/disagree with the statements. For each of these five questions, we used

LDA and sLDA to extract information from these responses, and for the final question, LDA

was the only option considered. In the estimation of sLDA, we used responses to the

agree/disagree statement as the response variable. A trial-and-error approach was adopted to

determine the number of extracted topics. After each estimation, we investigated if the

extracted topics overlapped (based on inter-topic distance) and were meaningful. However, we

did not see significant improvement in the performance of the models with the use of sLDA;

hence, we are limiting our discussion to LDA. We present the results in Table 4.8.

We extracted four topics from OE1; the first extracted topic (To_L11) was primarily about the

easiness of getting it to work, learn and gain trust. The second topic (To_L12) emphasised the

need for human presence to respond to uncertainties, the third topic (To_L13) covers the

P a g e | 76

easiness in operation, and the fourth topic (To_L14) covers additional benefits from the self-

navigation in addition to the easiness.

Table 4.8 Top 5 Words for Each Topic for Open-ended Questions

Word_1

Word_2

Word_3

Word_4

Word_5

OE1- Do you think that it will be easy to use Autonomous Vehicles

To_L11

use

easi

technolog

work

get

To_L12

drive

road

human

mani

accid

To_L13

drive

control

driver

make

easier

To_L14

oper

everyth

assum

user

OE2- Do you believe that Autonomous Vehicles are useful?

To_L21

time

better

environ

make

save

To_L22

drive

driver

thing

work

make

To_L23

accid

human

traffic

reduc

help

To_L24

drive

use

get

help

disabl

To_L25

use

drive

need

technolog

situat

To_L26

take

attent

pay

use

To_L27

driver

help

transport

safeti

safer

OE3- Do you have safety concerns regarding the use of Autonomous Vehicles?

To_L31

concern

drive

safeti

control

self

To_L32

technolog

safe

work

need

time

To_L33

malfunct

accid

caus

comput

happen

To_L34

road

driver

drive

get

accid

To_L35

thing

wrong

abl

make

To_L36

human

error

driver

make

situat

OE4- Do you have concerns related to privacy associated with the use of Autonomous Vehicles?

To_L41

make

thing

privat

abl

noth

To_L42

hack

technolog

get

system

To_L43

privaci1

concern

issu

sure

relat

To_L44

track

alreadi

technolog

locat

differ

To_L45

drive

someon

record

even

driver

To_L46

inform

data

person

compani

need

OE5- Would you as a user trust an Autonomous Vehicle?

To_L51

safe

test

technolog

trust

enough

To_L52

trust

technolog

use

time

To_L53

trust

safeti

work

concern

road

To_L54

drive

driver

human

comput

better

To_L55

control

drive

make

abl

OE6- What are your general opinions about Autonomous Vehicles?

To_L61

use

drive

cool

feel

get

To_L62

accid

driver

road

potenti

danger

To_L63

futur

great

technolog

time

transport

To_L64

drive

make

human

thing

control

To_L65

need

technolog

work

lot

idea

To_L66

good

safeti

concern

thing

improv

privacy and concern can be combined; they however did not appear in the same sequence in

a sentence and hence was not combined

P a g e | 77

We extracted seven broad themes using LDA from the responses to the perceived usefulness

of AVs. First, respondents believed that AVs might save travel time and make travel more

environmentally friendly (To_L21). Second, on the ability to work during travel, participants

shared contrasting views. Respondents believed that AVs might facilitate working during travel

(To_L22); it may, however, demand additional attention, which may negatively affect their

work (To_L26). Also, AVs might make travelling safer and mitigate congestion (To_L23),

make parking easier (To_L27) and ensure mobility for the disabled (To_L24). Finally, many

participants emphasised the need for human control while using AVs (To_L25).

The next open-ended question evaluated the safety concerns associated with the use of AVs.

The safety concerns stemming from the lack of control is a significant concern (To_L31). Many

argue that the lack of control can cause accidents (To_L33) or due to malfunctions (To_L34)

or sensor fails (To_L35). Furthermore, as humans are error-prone, many believe that there

could be flaws in the software programs (To_L36) and emphasise the need for thorough testing

of AVs before their widespread deployment (To_L32).

We then evaluated if individuals had privacy concerns related to the use of AVs. Many shared

no privacy concerns as they opined that it was unnecessary if they are transparent (To_L41).

Another argument was that the information is already in the public domain (To_L44) through

various platforms. Some opined that they do have concerns, but it was not something that they

should be bothered about (To_L43). Furthermore, it would not be a concern if users are

informed about data collection and storage (To_L46). Regarding some of the concerns, they

were mostly related to hacking (To_L42) and the potential for being watched (To_L45).

In the fifth open-ended question (OE5), we asked respondents if they would trust AVs. A

significant proportion of respondents were not yet ready to trust AVs. Trust issues could be

related to the need for further testing (To_L51) and the potential safety concerns due to

malfunctions (To_L53). Sceptics argued that humans could drive better (To_L54); however,

respondents favouring the system argued that computers might control better (To_L55).

Probably over time, more users might start trusting the system (To_L52).

In response to the general perceptions of AVs (OE6), we can group the ideas discussed by

respondents into six categories. In general, respondents were optimistic, and they consider it

cool to use AVs and a good idea (To_L61). Many considered AVs a tremendous technological

advancement with some safety concerns (To_L62). It is encouraging also to note that many

considered it futuristic (To_L63), although they emphasised the need for additional work

P a g e | 78

(To_L65). Finally, in general, people perceive it as a safe, economical, and environmentally

friendly travel option (To_L66), probably safer than humans (To_L65).

Figure 4.10 Inter-topic Distance for LDA (clockwise from top left) OE1, OE2, OE3, OE4,

OE5, OE6

After estimating the coefficients, we evaluated the intra-topic distance using the visualisation

tool pyLDAvis [195] (Figure 4.10). There is no overlap between the topics extracted for OE1

and OE3. The overlaps are high for OE2 and moderate for all the other questions. OE3 and

OE4 have moderate overlaps, and OE1 and OE5 have no overlap.

4.5.3 Comparison of Closed- and Open-ended Responses

This section investigates if we could find coherence between the responses to the Likert scale

questions and the responses to the open-ended questions. The open-ended questions were

designed with care to ensure this. As discussed previously, twenty Likert scale questions were

presented to respondents answering questionnaires Ver_Lk and Ver_LkOE and six open-ended

questions to those answering Ver_OE. It is encouraging to note that we could extract topics

from the open-ended response related to most aspects of closed-ended questions. Furthermore,

the open-ended responses could extract more information on the topic. We present a discussion

on the responses in the paragraphs to follow (refer to Table 4.9).

P a g e | 79

For the questions related to the “Perceived Ease of Use”, we could achieve a one-to-one

mapping between the Likert scale and the open-ended responses. In addition to the four

different aspects from the Likert scale responses, Topic Models also highlighted the need for

control. Responses to the question on the “Perceived Usefulness” did not have a direct mapping

to the Likert scale questions “Useful in meeting travel needs” and “Useful when I’m impaired.”

Having identified the remaining characteristics presented in the Likert scale questions, Topic

Models identified other aspects such as “mobility for the disabled”, “congestion reduction”,

“environmental friendliness”, and “travel time reduction” that makes AVs worthwhile. When

asked about the safety concerns regarding AVs, respondents emphasised the worries on general

safety, accidents caused due to malfunctions and failures, lack of testing and the general error-

prone nature of humans in designing such systems. It is worth noting that the last two aspects

presented in the statement before this were observed only in the open-ended responses.

The distribution of the responses to the Likert scale questions related to privacy concerns was

bell-shaped, with a very high number of neutral responses and an almost equal proportion of

respondents who either agree or disagree with each of the statements. Results of the analysis

of open-ended responses indicate that most respondents are not worried about privacy issues.

And it stems out from the reasoning that there is nothing to be worried about if you are

transparent. Another argument is that the information is already in the public domain and that

the companies will be transparent on the data collection and storage policies. Individuals who

are indeed worried about privacy foresee the possibility of hacking and tracking.

Slightly more than 50% of respondents answering Ver_OE of the questionnaire trusted AVs.

The Likert scale questions covered questions asking if respondents considered AVs

“dependable”, “reliable”, and “can be trusted”. Analysing the extracted topics indicated that

respondents emphasise testing and time to appreciate AVs because respondents fear

malfunctions. Interestingly, respondents who trust AVs argue that computers can control better,

while those who do not trust argue otherwise. There was almost no correspondence between

the responses to the Likert scale questions and open-ended questions for the question related

to the general attitudes on AVs. Respondents answering both versions emphasise that the use

of AVs is a good idea. In addition to this, respondents answering Ver_OE consider it futuristic

and sustainable.

The results reiterate the previous study's findings using data collected from India that open-

ended questions can collect nearly the same information from the Likert scales if asked

P a g e | 80

appropriately. However, one of the caveats of the approach is the inability of open-ended

questions to capture the degree or intensity of an individuals’ attitudes.

Table 4.9 Mapping between Closed- and Open-ended Responses

Topics

Likert Scale

Topics

Do you think that it will be easy to use Autonomous Vehicles?

Easy for me

Easy to get them to do what I want them to do

Easy to become skilful

Easy to use

Do you believe that Autonomous Vehicles are useful?

Useful in meeting travel needs

Perform other tasks

Decrease my accident risk

Relieve my stress of driving

Useful when I’m impaired

Mobility for the disabled

Lesser congestion

Better for environment

Saves time

Do you have safety concerns regarding the use of Autonomous Vehicles?

Worried about the general safety of such technology

Worried that failure or malfunction of AVs may cause accidents

Need more testing

Humans are error-prone

Do you have concerns related to privacy associated with the use of Autonomous Vehicles?

Collect too much personal information from me

Use personal information for other purposes without authorisation

Share personal information for other purposes without authorisation

Potential for being watched

Would you as a user trust an Autonomous Vehicle

Dependable

Reliable

Can be trusted

What are your general opinions about Autonomous Vehicles?

Good idea

Wise idea

Pleasant

Futuristic

Sustainable

4.6 MODELLING FRAMEWORK

This research used Probabilistic Graphical Models (PGMs), which accounts for uncertainty

using probability theory. The advantage of using PGMs is that it is a mathematically grounded

framework for measuring the changes in uncertainty with the availability of new data.

P a g e | 81

Furthermore, the framework has familiarities with Bayesian Structural Equations Models

[212], and for a detailed description of the approach, readers may refer to Peled et al. [194].

In this paragraph, we discuss the proposed modelling framework. Consistent with this

framework, Figure 4.11 and Figure 4.12 depicts the Probabilistic Graphical Model (PGM). In

the figures, we present the observed variables using shaded nodes, the latent variables using

unshaded nodes and the arrows to indicate the relationship between the different variables. The

unshaded node (Att) represents the attitude unknown to the researcher/policymaker. Each

measured attitude is related to the unknown attitude “Att” using a set of K-dimensional

multivariate linear regression models. Having estimated the latent attitude, they are then used

to model the mode choice (Y) for commute trips.

Considering the nominal nature of the observed choice “Y”, we used a multinomial logit

formulation to model it, with socio-economic characteristics of the individual, travel

characteristics, familiarity with AVs, latent attitude “Att”, and characteristics of stated-

preference experiments as explanatory variables. αC and βC are the respective alternative

specific constants and coefficients for the “choice” model. The larger plate with “N” indicates

N repetitions of the model, and the smaller plate with “C” indicate C repetitions of the model.

Having devised the PGM, we outline the generative process by first defining the distributions

for the coefficients (scalars in regular font, vectors in bold). The distribution of the coefficients

is assumed a mean “0” and a standard deviation “1”. First, we draw the latent attitudinal

variable “Att” from a multivariate normal distribution with a mean estimated using the

regression equation (a function of attitudes) and a standard deviation of “1”. Next, we draw the

choice variable “Y” from a multinomial distribution using the utility computed using the socio-

demographics, travel, attitudes, and characteristics of the choice experiment.

Individual Models

To benchmark the performance of our proposed framework, we estimated three models using

the individual datasets (Ver_LK, Ver_LKOE and Ver_OE). In Figure 4.11, shaded node, Attin

depicts the attitudinal variables measured using closed- or open-ended questions for Ver_LK,

Ver_LKOE or Ver_OE versions. αin and γin represent the intercepts and the slopes for Ver_LK

(similar for other datasets). αc and γc represent the alternative-specific constants and

coefficients for the utility equation, and shaded nodes “Xs” and “Y” denotes explanatory

variables and choice variable.

P a g e | 82

Generative Process

1. For equation k  {1, …, K}:

1. 󰇛󰇜󰇛󰇜󰇜

2. 󰇛󰇜󰇛󰇜󰇜

3. 󰇛󰇜󰇡󰇛󰇜󰇻󰇛󰇜󰇛󰇜󰇛󰇜󰇢󰇜

2. For each class c  {1, …, C}:

1. 󰇛󰇜

2. 󰇛󰇜

3. 󰇛󰇛󰇟󰇠󰇜󰇜

Joint Probability Distribution

󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜



 󰇛󰇜



 󰇭󰇛󰇜

󰇛󰇜

󰇛󰇜󰇛󰇜



 󰇮

Figure 4.11 Probabilistic Graphical Model for Individual Model

Proposed Model (Figure 4.12)

Shaded node, AttLK, AttLKOE and AttOE depict the attitudinal variables measured using closed-

or open-ended questions for Ver_LK, Ver_LKOE and Ver_OE. αLK and γLK represent the

intercepts and the slopes for Ver_LK (similar for other datasets). Furthermore, “z” indicates

the type of questionnaire presented to the respondents. αc and γc represent the alternative-

specific constants and coefficients for the utility equation, and shaded nodes “Xs” and “Y”

denotes explanatory variables and choice variable.

P a g e | 83

Figure 4.12 Probabilistic Graphical Model for the Proposed Model

Generative Process

1. For equation k  {1, …, K}:

1. 󰇛󰇜󰇛󰇜󰇜

2. 󰇛󰇜󰇛󰇜󰇜

3. 󰇛󰇜󰇛󰇜󰇜

4. 󰇛󰇜󰇛󰇜󰇜

5. 󰇛󰇜󰇛󰇜󰇜

6. 󰇛󰇜󰇛󰇜󰇜

7. 󰇛󰇜󰇛󰇜󰈑

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇜

2. For each class c  {1, …, C}:

1. 󰇛󰇜

2. 󰇛󰇜

Joint Probability Distribution

󰇛󰇜

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜



 󰇛󰇜



 

󰇭 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜

󰇛󰇜󰇛󰇜



 󰇮

P a g e | 84

4.7 ESTIMATION RESULTS

The discussion in this Section pertains to the third objective, and the models were estimated

using Pyro in a GPU. We used the Stochastic Variational Inference proposed by Hoffman et

al. [207] to draw the Bayesian Inference. To benchmark the performance of the models, we

estimated three models separately for each of the datasets (named “Ind”) and compared the

performance of the proposed model (named “Prop”) with the performance of each of these

individual models. Since the models for Ver_OE estimated using sLDA did not perform better

than LDA, we present only the results from LDA in this section. We performed this analysis

using both a training (80%) and a test set (20%). We present the performance measures of the

individual models in Table 4.10. Similar to the approach followed in Section 3.7, we compare

the performance using the initial log-likelihood (LLI), log-likelihood with respect to constants

(LLC), final log-likelihood (LLF) and the McFadden pseudo-R-squared value (ρ2). We compute

the McFadden pseudo-R-squared value with respect to constants () to account for the

improvement of the model with respect to constants, and the adjusted McFadden pseudo-R-

squared is used to account for the improvements with the estimation after considering the

estimated parameters. We did not account for the parameters for LDA while computing the

adjusted values for ρ2 and Count R2. In addition to this, we present the goodness-of-fit measures

such as the count R2 and the F1 scores.

This analysis has two parts- to quantify the improvements, a. with the introduction of open-

ended questions before the set of closed-ended responses, b. with the use of open-ended

questions. By pursuing the first part of the analysis, we re-visit and investigate further the

findings from our research using India's dataset (Chapter 3). The second part of the analysis

involves extending the analysis further by relying only on open-ended questions to measure

attitudes.

To address the first part of the analysis, we compare the models’ performance using Ver_LK

and Ver_LKOE of the questionnaire. Referring to Table 4.10, improvements in results based

on ρ2 and the count R2 was the biggest for the models estimated using Ver_LK of the

questionnaire. However, after accounting for the constants, the models estimated using

Ver_LKOE had the best performance and a similar trend for Count R2. This loss in performance

is reasonable as, unlike other datasets, Ver_LK had a higher proportion of respondents

choosing “Regular Cars”, which was significantly higher than the other options presented to

the respondents. Having nearly 50% of the respondents choosing a given alternative is likely

P a g e | 85

to influence models’ forecasting capability. Exploring the second aspect of this analysis, we

could not find improvements in performance using questionnaires that used only open-ended

questions to measure attitudes.

Table 4.10 Goodness-of-fit Measures for Training and Test Set

Ver_LK

Ver_LKOE

Ver_OE

Ind

Prop

Ind

Prop

Ind

Prop

Training Set

LLI

-5328.27

-5211.82

-5127.22

LLC

-5021.83

-5059.77

-5106.44

-5111.87

-5080.24

-5100.86

LLF

-4402.87

-4451.77

-4552.77

-4477.36

-4654.19

-4733.33

ρ2

0.1737

0.1645

0.1265

0.1409

0.0923

0.0768



0.1233

0.1202

0.1084

0.1241

0.0839

0.0721

Adj. ρ2

0.1555

0.1463

0.1078

0.1223

0.0800

0.0645

Count R2 (%)

59.794

58.495

59.949

58.263

50.739

49.175

Adj. Count R2 (%)

35.087

35.871

43.486

44.020

33.516

32.938

F1 Score

0.535

0.528

0.570

0.560

0.494

0.478

Test Set

LLI

-1260.11

-1424.90

-1175.52

LLC

-1169.96

-1186.24

-1406.59

-1408.11

-1161.49

-1164.32

LLF

-1283.17

-1220.51

-1466.34

-1248.59

-1156.79

-1122.08

ρ2

-0.0183

0.0314

-0.0291

0.1237

0.0159

0.0455



-0.0968

-0.0289

-0.0425

0.1133

0.0040

0.0363

Adj. ρ2

-0.0953

-0.0456

-0.0972

0.0557

-0.0377

-0.0081

Count R2 (%)

54.577

53.357

55.744

55.898

46.542

46.636

Adj. Count R2 (%)

24.602

26.309

38.936

40.167

31.167

32.902

F1 Score

0.447

0.441

0.525

0.532

0.449

0.452

To measure the tests’ accuracy, we compared the F1 scores, and Ver_LKOE of the

questionnaire had the highest values, and we obtained reasonable and comparable values for

Ver_LK and Ver_OE questionnaires. Similar trends are observed for the test set; although, the

performance for Ver_LK was even lower than those for Ver_OE. Interestingly, in the test set,

the proposed model for Ver_LKOE performed significantly better than Ver_LK and Ver_OE

of the questionnaire.

When evaluating the performance of our proposed models with respect to the individual

models, it is worth noting that the proposed model performs better compared to the individual

models. For the training set, this is observed only for Ver_LKOE; however, for the test set, this

performs significantly better for all three versions of the questionnaires and is valid for the

various goodness-of-fit measures such as ρ2 value, adjusted count R2 and the F1 scores.

P a g e | 86

We compared the differences in direction (+ve/-ve) and magnitudes of coefficients between

the proposed model and individual models of the characteristics such as socio-demographics,

travel, stated-preferences, and attitudes. The corresponding number of questions for which

estimated coefficients for socio-demographics, travel and stated-preferences had the same

direction were 60 (62.5%), 69 (71.875%) and 78 (81.25%) for Ver_LK, Ver_LKOE and

Ver_OE, respectively. The number of variables for which the estimated coefficients was

significant was similar for all models, Ver_LK (72 (73.47%)), Ver_LKOE (73 (74.49%)) and

Ver_OE (70 (71.43%)). When it comes to attitudes, both proportion for the two Likert scale

versions of the questionnaire is the same. However, the statistical significance of the estimated

coefficients was higher for the proposed model (Ver_LK- 94.57 and Ver_LKOE- 92.39)

compared to the individual models (Ver_LK- 84.78 and Ver_LKOE- 78.26).

For the estimation, we consider “Regular Car” as the base alternative and estimated coefficients

for “Private AVs” and “Shared AVs”. As discussed previously, we included variables related

to the socio-demographic characteristics of the individual, travel, attitudes and the

characteristics of the stated-preference experiments as explanatory variables. To evaluate the

performance of these variables, we compared the nature of estimated coefficients with those

reported in the literature. It is interesting to note that, in general, these coefficients align with

the findings from other researchers, which is reassuring, particularly for the proposed

framework. Among the many variables discussed in greater detail in Appendix E, it is

encouraging to note that the coefficients for the stated-preference section align with the

findings of Haboucha et al. [40]- the study from which we adopted SP experiments.

4.8 PROPOSED FRAMEWORK TO MODEL ATTITUDES JOINTLY

The development of this framework is related to the fourth objective of our research. Having

estimated the models using our proposed framework, we use the estimated coefficients to allow

researchers/analysts to estimate the corresponding scores using other questionnaire types. For

instance, for a given open-ended response, what would be the corresponding response on a

Likert scale. We achieve this by using the coefficients (αLK, γLK, αLKOE, γLKOE, αOE, γOE) and

the values for the latent attitudes (Att). The Probabilistic Graphical Model for the modified

framework is illustrated in Figure 4.13, and we achieve this proposed framework using Gibbs

Sampling [213].

P a g e | 87

Figure 4.13 Probabilistic Graphical Model for the Modified Framework

The latent attitudes (yatt) are multidimensional, and so are the coefficients α and γ. Assuming

“X” to be the response to a Likert scale response or the topic proportions for an open-ended

response, equations for the latent attitudes can be written as: -





The values for each of these explanatory variables can be computed using the approach

described below: -

1. Initialise values for X11, X12, X13, X14, X15, … …, Xn1, Xn2, Xn3, Xn4, Xn5.

2. For i in range(iterations):





























…















The initial values of X11, X12, X13, X14, X15, … …, Xn1, Xn2, Xn3, Xn4, Xn5 thus estimated, can

be discarded to account for the warm-up phase. As the number of iterations increases, the

estimated values for these variables tend to converge. However, this research problem presents

additional challenges in adopting the conventional approach as the variable sets (X11, X12, X13,

P a g e | 88

X14, X15, …, Xn1, Xn2, Xn3, Xn4, Xn5) are not mutually independent. Each of these variables in

a set must sum one. To account for this, we use vectorisation in Linear Algebra and draw the

variables from a Dirichlet distribution.

The proposed approach allows respondents to choose the questionnaire type that is of interest

to them. If the existing models used by researchers/analysts utilise responses to Likert scale (or

open-ended) questions and that the respondents have answered using open-ended (or Likert

scale) responses, using this framework equips them to deduce approximate Likert scale (or

open-ended) responses which could then be used for predicting behaviour. We demonstrate

this was using Figure 4.14. In our study, attitudes are two dimensional, and we segment

respondents into two categories ("Both Negative” and “First Negative”) using the nature (sign)

of the estimated attitudes. For each dataset (Ver_LK and Ver_LKOE), we estimate the topic

proportions and obtain the word clouds for the observed scale responses.

Figure 4.14 Predictions Using the Proposed Framework

4.9 CONCLUSION

Chapter 3 investigated the potential to use Topic Modelling to extract information from the

open-ended questions used to measure attitudes in travel behaviour research. Having observed

positive results using open-ended questions in measuring attitudes, we pursued this second

study using a large and representative dataset collected from the United States of America.

Contrary to the previous study, we used three versions of the questionnaire, described in detail

in Section 4.2.1. We collected 3002 responses from the USA between January and March 2020.

P a g e | 89

Using this dataset, we pursued four research questions, and in the subsequent paragraphs, we

summarise the findings related to each of these.

The first objective investigated if the method of collecting qualitative data influences the survey

responses. In this regard, we compared the responses to the Likert scale responses common to

the two versions (Ver_LK and Ver_LKOE) of the questionnaire and the mode choice. From

the frequency distributions, one could see a clear difference in the distributions of the

responses, which upon further investigation using the Mann-Whitney U Test and Statistical

Analysis were observed to be statistically significant for most of the variables (85% for the

Mann-Whitney U test and 60% for statistical analysis). The results reiterate the findings from

our previous study carried out in India. We observed a decline in the proportion of neutral

responses among those answering the questionnaire that also included the open-ended question.

There is also an increase in the proportion of respondents choosing the extreme points on the

Likert scale. We hypothesised that the respondents answering the open-ended questions must

pause, think, and articulate their response, altering their thought process. However, we could

not observe a difference in duration for answering alternative questionnaires, which could also

be related to the method of dissemination of the questionnaires as the time taken to answer the

responses in a page might be influenced by the speed of the internet and the device used to

answer the questionnaire. Besides, the break in the thought process might have encouraged

them to respond to the Likert scales with more caution.

The second objective involved the development of a framework to extract information from

the open-ended responses. First, we cleaned the grammatical and spelling mistakes in the

responses to each of the questions using Grammarly before carrying out an exploratory analysis

and eventually using Topic Modelling approaches such as LDA and sLDA to extract

information. Next, we evaluated the extracted topics to see if they were meaningful, and using

PyLDAvis, we evaluated the inter-topic distance to see if the extracted topics were distinct. In

addition to this, we compared the extracted topics with the statements used in the Likert scale

responses.

Having extracted the information from open-ended responses, we developed the framework

that modelled the mode choice for commute trips. Using the Probabilistic Graphical Models,

this framework used attitudes as a latent variable and the other explanatory variables. Central

to the development of the framework is the notion that attitudes are latent constructs measured

using closed- and open-ended responses and were modelled simultaneously and estimated

P a g e | 90

using the probabilistic programming language Pyro. For the attitudinal variables, we included

variables used in the Technology Acceptance Model.

In this study, Ver_LK used only Likert scale questions to measure attitudes, Ver_LKOE

introduced open-ended questions before the Likert scale questions, and Ver_OE represents the

other extreme which uses only open-ended questions. Ver_LK and Ver_LKOE are similar to

questionnaires used in our study carried out in India (Chapter 3). Although the improvements

are not significant, these reiterate the conclusions from our previous study in India that the

version with open-ended responses performs better than the version with only Likert scale

responses. And we believe this could be because the open-ended questions before the Likert

scale questions encourage people to probably pause and think, which could have caused them

to think more coherently and is not because of the open-ended questions per se. The

improvements are particularly impressive for the test set (0.0314 for Ver_LK and 0.1237 for

Ver_LKOE). However, if researchers are to use only open-ended questions to measure

attitudes, the performance of the models are not at par with those that used Likert scales. It is

also worth noting that our proposed framework outperforms individual models, except for

Ver_OE in the training set. And in the test set, the proposed framework performs better than

individual models for all versions of the questionnaire- which is positive.

Our proposed framework allows respondents to choose their preferred type of question (closed-

or open-ended) to answer the questionnaire and allows researchers/analysts to be flexible. Our

proposed framework does not necessarily imply better results but could be useful when datasets

are collected with different approaches. For instance, as demonstrated, researchers could use

scores to estimate their current models, calibrated for Likert scales or open-ended responses.

While the use of Topic Modelling presents researchers with a faster and more efficient solution

to extract information from text, they should make conscious and careful decisions regarding

the various data cleaning techniques to be used. For instance, researchers should identify other

words for inclusion to the list of “Stop Words”- Autonomous, Vehicles, Cars, that do not add

additional information. Moreover, identifying and combining words to form sequences is

inevitable and executed carefully, as in the absence of this, their use might imply a different

meaning. Combining words should be carefully done carrying out preliminary analysis on such

combinations and their occurrence in the dataset, which might pose challenges to the analyst.

P a g e | 91

5 CONCLUSION

5.1 INTRODUCTION

To devise appropriate strategies for policy implementations, it is quintessential for both

researchers and policymakers to understand attitudes. Although most studies have used the

closed-ended approach to measure attitudes, closed- and open-ended approaches have been

used. However, there is still a debate on which is a more appropriate approach to measure

attitudes.

The closed-ended approach presents an approach that is relatively easier to both respondents

and analysts, as it facilitates a rapid and convenient analysis with well-established modelling

techniques. However, there are concerns regarding using a closed-ended approach, as critics

argue that the closed-ended approaches measure aspects that may be relevant to the

researcher/policymaker and not necessarily to the respondent. On the other hand, the open-

ended approach allows respondents to articulate their attitudes freely without being

constrained. However, it comes with some serious concerns regarding the processing and the

analysis of the data, as it is highly time-consuming and expensive. To extract information from

open-ended responses, we could use Topic Modelling- a recent development in Natural

Language Processing. In addition to this, the mode of asking questions (open- or closed-ended

approach) might influence the responses. Considering this, we pursued four objectives in this

study, a. to analyse if the method of collecting qualitative data influences the survey responses,

b. to develop an approach to extract open-ended responses from a survey and process the data,

c. to compare the relative performance of the open-ended and closed-ended responses in

analysing qualitative data, d. to develop a framework that measures attitudes while allowing

respondents to choose their preferred type of question (closed- or open-ended).

To accomplish the first objective, we built different versions of the questionnaires and

randomly presented them to the respondents with Likert scale questions and a combination of

Likert scales and open-ended questions. To extract information from the open-ended responses,

techniques in Topic Modelling such as Latent Dirichlet Allocation and supervised Latent

Dirichlet Allocation approaches were used. Having extracted the information from the open-

ended responses, we estimated models that used responses from the different datasets to predict

behaviour. Finally, we compare the models’ performance and quantify the improvements using

each of these approaches. In the first study, we used ordered Probit models, and in the second

study, we used the Probabilistic Graphical Models. In the second study, we proposed a

P a g e | 92

framework that allows analysts/researchers to model the choice irrespective of the respondents’

questionnaire to answer the survey. Furthermore, using the estimated coefficients and the latent

variables for a given questionnaire type, the corresponding scores for other questionnaire types

can be generated.

5.2 DATA DESCRIPTION

This study used two datasets; the first dataset collected information on the intention to use

Shared AVs from India between November 2017 and March 2018. Two versions of the

questionnaire, Ver_Lk- only Likert scale responses and Ver_LkOE- a combination of Likert

scales and open-ended responses, were used. The alternative versions of the online

questionnaire were distributed through Facebook, WhatsApp, and mailing lists, with the help

of bloggers. After removing inconsistent responses and respondents who answered it too

quickly, the final dataset comprised 364 complete responses (Ver_Lk- 201 and Ver_LkOE-

163). When comparing the distributions of the socio-demographic characteristics of the

respondents, we observed no statistically significant differences in the characteristics of

respondents answering the two versions of the questionnaire. However, this dataset had a

higher proportion of males and young and middle-aged respondents.

To address the concerns related to the sample size and the representativeness of the dataset

collected from India, we carried out a second study measuring the intention to use AVs as a

mode for commute trips from the USA. Contrary to our previous study, we used statements for

Likert scale questions tested previously by other researchers. To analyse if the type of

questionnaire influences the responses to the Likert scale questions, we used two

questionnaires- Ver_LK and Ver_LKOE. In addition to this, to analyse if we could replace all

Likert scale questions with open-ended questions, we used the third version, Ver_OE. To

collect representative samples, we used online panels provided by Cint. After removing records

with inconsistencies, the final dataset comprised 3002 responses. Thus, the dataset was

representative of the population based on gender, ethnicity, and regional representation.

5.3 SALIENT FINDINGS

We present the salient findings from our research below: -

5.3.1 Influence of Questionnaire Type on the Responses

We evaluated if the use of open-ended questions before the set of Likert scale questions

influences the responses to these questions. We do this by comparing the frequency

P a g e | 93

distributions of responses to these questions between Ver_LK and Ver_LKOE of the

questionnaire. Later, we performed the non-parametric tests (Mann-Whitney U test) and

statistical analysis to analyse this influence and narrow it down to the influence of questionnaire

type. In both datasets, we observed a difference in the distributions of the responses to the

Likert scales, which is statistically significant. In general, we see that respondents answering

Ver_LKOE of the questionnaire had a higher positive attitude towards AVs. However, the most

important aspect related to the introduction of these open-ended questions was the reduction in

the number of neutral responses- respondents are less conformist. From the perspective of an

analyst/researcher in survey design, the results highlight an improvement in the models’

performance built from the collected surveys. They are particularly relevant as one need not go

all the way to implementing complex approaches such as Topic Models, thus offering relevant

guidance for improvements that are pretty easy to implement.

5.3.2 Approach to Extract Information from Open-ended Responses

We used Topic Modelling approaches such as Latent Dirichlet Allocation and supervised

Latent Dirichlet Allocation methods to extract information from the open-ended responses. We

cleaned the responses by correcting them for spelling and grammatical errors. In addition to

this, we performed the standard text processing approaches such as removing “Stop Words”,

lemmatisation and formation of compound words. Having processed the data, we estimated

Topic Models and the extracted topics were evaluated for their meaning and were analysed to

evaluate if they were distinct. We then compared the extracted topics with the ideas discussed

in the statements used in the Likert scales. We observed correspondence between the topics

and the statements used for the Likert scale responses for both datasets, indicating the

suitability for using open-ended questions to measure attitudes and Topic Modelling to extract

information from open-ended responses. Results from our study from India highlights the need

for the careful design of the open-ended questions.

5.3.3 Evaluate the Relative Performance of Closed- and Open-ended Approaches

One of the main objectives of this thesis was to evaluate the performance of a questionnaire

that also used open-ended responses with a questionnaire that used the Likert scale responses.

In the first study that used the dataset from India, we compared the performance of a dataset

that used Likert scale responses with a questionnaire that used a combination of Likert scales

and open-ended responses. The version that used a combination of Likert scales and open-

ended responses performed better than the version that used only Likert scales, and the

P a g e | 94

estimated coefficients were also more meaningful, as open-ended questions before the Likert

scale responses make individuals pause, think, and answer more coherently. In this study, the

topic extracted from the open-ended responses was also statistically significant, while that for

the Likert scale response was not.

In the second study, we collected three datasets (for each questionnaire type) and compared

models’ performance. As was observed in the dataset collected from India, the model estimated

for the dataset that used a combination of Likert scales and open-ended questions performed

better than the dataset that used only Likert scales. However, the models estimated for the

dataset that used only open-ended responses did not perform as good as the other models. But,

the models for the dataset that used open-ended responses (as warming up questions) and Likert

scales performed better than the dataset that used only Likert scales for the test set.

5.3.4 A framework to Measure Attitudes that Allow Respondents to Choose Their Preferred

Questionnaire Type

Using the dataset collected from the USA, we also proposed a framework that estimated a

combined model to predict the intention to use AVs for commute trips. The model assumes

attitudes to be latent constructs and closed- or open-ended approaches to be the different

instruments used to measure attitudes. Using this approach allows researchers/analysts to be

flexible with the data collection approach yet use the estimation method convenient.

5.4 LIMITATIONS OF THE CURRENT STUDY AND DIRECTIONS FOR FUTURE

RESEARCH

We must highlight that while the use of Topic Modelling significantly reduces the time for the

extraction of topics, one should not overlook the importance of data pre-processing for the text

analysis. This warrants due diligence as researchers should strike a balance between data

cleaning as it is time-consuming and may sometimes remove the inherent structure of the

response. Furthermore, based on our experience from the two datasets, each question might

demand the use of “Stop Words” that are question-specific and probably data-specific (should

be explored with more datasets), which is also the case with combining words.

Another challenge associated with the study; is that we could not find improvements using

open-ended questions. Analysing these responses might be influenced by data cleaning

strategies. And analysing such responses with the help of various theories in linguistics might

be an avenue for further research. This is particularly in light of the findings of other researchers

P a g e | 95

regarding the use of Artificial Intelligence, as its use without anchoring on theories might

reiterate the aphorism “Garbage in Garbage out” [214].

Furthermore, to frame appropriate questions and improve the quality of the responses to the

open-ended questions, it would be interesting to carry out in-depth interviews. These surveys

also facilitate the development of appropriate strategies for cleaning the responses and is

particularly relevant because we believe that the models’ performance would have been

affected by framing the questions and the quality of the answers.

This thesis used Topic Modelling to extract information from open-ended responses, for which

we used Natural Language Toolkit (NLTK). However, the languages supported in NLTK are

limited (23), which could pose difficulties for extracting information from open-ended

responses for surveys conducted in languages not supported by NLTK.

The results (backed by both studies) highlight the potential for open-ended questions to alter

the responses to closed-ended responses- reduce neutral responses while making respondents

more decisive. And it is therefore interesting to evaluate the influence of open-ended questions

in revealed preference surveys in travel behaviour research and other domains.

One of the main issues related to open-ended questionnaires is the difficulties in writing/typing

as it is often burdensome to the respondents, which can be addressed by allowing respondents

to speak. In addition, to process the responses, researchers/analysts could use various speech

recognition techniques. It would be interesting to evaluate how combining these will be

effective in measuring attitudes.

P a g e | 96

P a g e | 97

REFERENCES

[1] P. Jones, “The Role of an Evolving Paradigm in Shaping International Transport Research and Policy

Agendas over the Last 50 Years,” Proc. XII Int. Assoc. Travel Behav. Res. Conf., vol. 3, p. 34, 2009.

[2] P. Jones, “The Evolution of Urban Mobility: The Interplay of Academic and Policy Perspectives,” IATSS

Res., vol. 38, no. 1, pp. 7–13, 2014.

[3] S. Kaplan, F. Manca, T. A. S. Nielsen, and C. G. Prato, “Intentions to Use Bike-Sharing for Holiday

Cycling: An Application of the Theory of Planned Behavior,” Tour. Manag., vol. 47, pp. 34–46, 2015.

[4] R. Kelkel, “Predicting Consumers’ Intention-to-purchase Fully Autonomous Driving Systems– Which

Factors Drive Acceptance?,” Universidade Católica Portuguesa, 2015.

[5] A. Mehdizadeh, S. Kaplan, J. De Abreu, O. Anker, and F. Camara, “Use Intention of Mobility-

management Travel Apps: The Role of Users Goals, Technophile Attitude and Community Trust,”

Transp. Res. Part A, vol. 126, no. May, pp. 114–135, 2019.

[6] S. Kaplan, J. de Abreu e Silva, and F. di Ciommo, “The Relationship Between Young People’s Transit

Use and Their Perceptions of Equity Concepts in Transit Service Provision,” Transp. Policy, vol. 36, pp.

79–87, 2014.

[7] K. K. Srinivasan and P. Bhargavi, “Longer-term Changes in Mode Choice Decisions in Chennai: A

Comparison between Cross-sectional and Dynamic Models,” Transportation (Amst)., vol. 34, no. 3, pp.

355–374, 2007.

[8] S. Sadhukhan, U. K. Banerjee, and B. Maitra, “Commuters’ Perception towards Transfer Facility

Attributes in and Around Metro Stations : Experience in Kolkata,” J. Urban Plan. Dev., vol. 141, no. 4,

pp. 1–8, 2015.

[9] F. Zhao et al., “Exploratory Analysis of a Smartphone-Based Travel Survey in Singapore,” Transp. Res.

Rec. J. Transp. Res. Board, vol. 2, no. March, pp. 45–56, 2015.

[10] R. Likert, “A Technique for the Measurement of Attitudes,” Arch. Psychol., vol. 22, no. 140, p. 55, 1932.

[11] J. S. Plant, “Rating Scheme for Conduct,” Am. J. Psychiatry, vol. 78, no. 4, pp. 547–572, 1922.

[12] M. Freyd, “The Graphic Rating Scale,” J. Educ. Psychol., vol. 14, no. 2, pp. 83–102, 1923.

[13] D. Rugg and H. Cantril, “The Wording of Questions in Public Opinion Polls,” J. Abnorm. Soc. Psychol.,

vol. 37, no. 4, pp. 469–495, 1942.

[14] J. G. Geer, “What do Open-ended Questions Measure?,” Am. Assoc. Public Opin. Res., vol. 52, no. 3, pp.

365–371, 1988.

[15] J. G. Geer, “Do Open-Ended Questions Measure ‘Salient’ Issues?,” Public Opin. Q., vol. 55, no. 3, p. 360,

1991.

[16] P. Bansal and K. M. Kockelman, “Are We Ready to Embrace Connected and Self-driving Vehicles? A

Case Study of Texans,” Transportation (Amst)., vol. 45, no. 2, pp. 641–675, 2018.

[17] P. F. Lazarsfeld, “The Controversy over Detailed Interviews— An Offer for Negotiation,” Public Opin.

Q., vol. 8, no. 1, pp. 38–60, 1944.

[18] J. M. Converse, “Strong Arguments and Weak Evidence: The Open/closed Questioning Controversy of

the 1940s,” Public Opin. Q., vol. 48, no. 1B, pp. 267–282, 1984.

[19] J. A. Krosnick, “Questionnaire Design,” in The Palgrave Handbook of Survey Research, D. L. Vannette

and J. A. Krosnick, Eds. Palgrave Macmillan, 2018, pp. 439–455.

[20] T. M. Ostrom and K. M. Gannon, “Exemplar Generation: Assessing How Respondents Give Meaning to

Rating Scales,” in Answering Questions: Methodology for Determining Cognitive and Communicative

Processes in Survey Research, N. Schwarz and S. Sudman, Eds. San Francisco, CA, 1996, pp. 293–318.

[21] J. A. Krosnick, “Response Strategies for Coping with the Cognitive Demands of Attitude Measures in

Surveys,” Appl. Cogn. Psychol., vol. 5, no. 3, pp. 213–236, 1991.

[22] S. L. Becker, “Why an Order Effect,” Public Opin. Q., vol. 18, no. 3, pp. 271–278, 1954.

[23] L. J. Cronbach, “Response Sets and Test Validity,” Educ. Psychol. Meas., vol. 4, no. 6, pp. 475–494,

P a g e | 98

1946.

[24] J. A. Krosnick and S. Presser, “Question and Questionnaire Design,” in Handbook of Survey Research,

2nd ed., vol. 112, no. 3, Emerald Group Publishing Limited, 2010, pp. 263–313.

[25] S. Iyengar, “Framing Responsibility for Political Issues,” Ann. Am. Acad. Polit. Soc. Sci., vol. 546, no. 1,

pp. 59–70, 1996.

[26] W. L. Neuman, Social Research Methods: Qualitative and Quantitative Approaches, Seventh Ed. Pearson

Higher Education, 2013.

[27] I. Ajzen, “The Theory of Planned Behavior,” Organ. Behav. Hum. Decis. Process., vol. 50, no. 2, pp.

179–211, 1991.

[28] F. D. Davis, Jr, “A Technology Acceptance Model for Empirically Testing New End-User Information

Systems: Theory and Results,” Massachusetts Institute of Technology, 1985.

[29] V. Venkatesh, M. G. Morris, G. B. Davis, and F. D. Davis, Jr, “User Acceptance of Information

Technology: Toward a Unified View,” MIS Q., vol. 27, no. 3, pp. 425–478, 2003.

[30] G. Carrus, P. Passafaro, and M. Bonnes, “Emotions, Habits and Rational Choices in Ecological

Behaviours: The Case of Recycling and Use of Public Transportation,” J. Environ. Psychol., vol. 28, no.

1, pp. 51–62, 2008.

[31] I. J. Donald, S. R. Cooper, and S. M. Conchie, “An Extended Theory of Planned Behaviour Model of the

Psychological Factors Affecting Commuters’ Transport Mode Use,” J. Environ. Psychol., vol. 40, pp. 39–

48, 2014.

[32] Ö. Şimşekoğlu, T. Nordfjærn, and T. Rundmo, “The Role of Attitudes, Transport Priorities, and Car Use

Habit for Travel Mode Use and Intentions to Use Public Transportation in an Urban Norwegian Public,”

Transp. Policy, vol. 42, pp. 113–120, 2015.

[33] S. Zailani, M. Iranmanesh, T. A. Masron, and T.-H. Chan, “Is the Intention to Use Public Transport for

Different Travel Purposes Determined by Different Factors?,” Transp. Res. Part D Transp. Environ., vol.

49, pp. 18–24, 2016.

[34] D. Lois, J. A. Moriano, and G. Rondinella, “Cycle Commuting Intention: A Model Based on Theory of

Planned Behaviour and Social Identity,” Transp. Res. Part F Psychol. Behav., vol. 32, pp. 101–113, 2015.

[35] Á. Fernández-Heredia, S. Jara-Díaz, and A. Monzón, “Modelling Bicycle Use Intention: The Role of

Perceptions,” Transportation (Amst)., vol. 43, pp. 1–23, 2016.

[36] S. Kaplan, D. K. Wrzesinska, and C. G. Prato, “The Role of Human Needs in the Intention to Use

Conventional and Electric Bicycle Sharing in a Driving-oriented Country,” Transp. Policy, vol. 71, pp.

138–146, 2018.

[37] A. A. De Souza, S. P. Sanches, and M. A. G. Ferreira, “Influence of Attitudes with Respect to Cycling on

the Perception of Existing Barriers for Using this Mode of Transport for Commuting,” Procedia - Soc.

Behav. Sci., vol. 162, pp. 111–120, 2014.

[38] W. Payre, J. Cestac, and P. Delhomme, “Intention-to-use a Fully Automated Car: Attitudes and a priori

Acceptability,” Transp. Res. Part F Traffic Psychol. Behav., vol. 27, no. PB, pp. 252–263, 2014.

[39] M. Kyriakidis, R. Happee, and J. C. F. de Winter, “Public Opinion on Automated Driving: Results of an

International Questionnaire Among 5000 Respondents,” Transp. Res. Part F Traffic Psychol. Behav., vol.

32, pp. 127–140, 2015.

[40] C. J. Haboucha, R. Ishaq, and Y. Shiftan, “User Preferences Regarding Autonomous Vehicles,” Transp.

Res. Part C Emerg. Technol., vol. 78, pp. 37–49, 2017.

[41] M. König and L. Neumayr, “Users’ Resistance Towards Radical Innovations: The Case of the Self-driving

Car,” Transp. Res. Part F Traffic Psychol. Behav., vol. 44, pp. 42–52, 2017.

[42] T. A. S. Nielsen and S. Haustein, “On Sceptics and Enthusiasts: What are the Expectations Towards Self-

driving Cars?,” Transp. Policy, vol. 66, pp. 49–55, 2018.

[43] C. Hohenberger, M. Spörrle, and I. M. Welpe, “How and Why do Men and Women Differ in Their

Willingness to Use Automated Cars? The Influence of Emotions Across Different Age Groups,” Transp.

Res. Part A, vol. 94, pp. 374–385, 2016.

P a g e | 99

[44] S. T. D. Cordazzo, C. T. Scialfa, K. Bubric, and R. J. Ross, “The Driver Behaviour Questionnaire: A

North American Analysis,” J. Safety Res., vol. 50, pp. 99–107, 2014.

[45] H. Iversen, “Risk-taking Attitudes and Risky Driving Behaviour,” Transp. Res. Part F Traffic Psychol.

Behav., vol. 7, no. 3, pp. 135–150, 2004.

[46] T. Nordfjærn, S. H. Jørgensen, and T. Rundmo, “An Investigation of Driver Attitudes and Behaviour in

Rural and Urban Areas in Norway,” Saf. Sci., vol. 48, no. 3, pp. 348–356, 2010.

[47] J. A. Krosnick and L. R. Fabrigar, “Designing Rating Scales for Effective Measurement in Surveys,” in

Survey Measurement and Process Quality, L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N.

Schwarz, and D. Trewin, Eds. New York: John Wiley & Sons, 1997, pp. 141–164.

[48] A. Bowling, “Handbook of Health Research Methods,” in Handbook of Health Research Methods-

Investigation, Measurement and Analysis, A. Bowling and S. Ebrahim, Eds. Berkshire: Open University

Press, 2005, pp. 394–427.

[49] H. Schuman, J. Ludwig, and J. A. Krosnick, “The Perceived Threat of Nuclear War, Salience and Open

Questions,” Public Opin. quarterlyPublic Opin. Q., vol. 50, no. 4, pp. 519–536, 1986.

[50] H. Schuman and J. Scott, “Problems in the Use of Survey Questions to Measure Public Opinion,” Science

(80-. )., vol. 236, no. 4804, pp. 957–959, 1987.

[51] W. R. Johnson, N. A. Sieveking, and E. S. Clanton, “Effects of Alternative Positioning of Open-ended

Questions in Multiple-choice Questionnaires,” J. Appl. Psychol., vol. 59, no. 6, pp. 776–778, 1974.

[52] H. Schuman and S. Presser, “The Open and Closed Question,” Am. Sociol. Rev., vol. 44, no. 5, pp. 692–

712, 1979.

[53] O. Friborg and J. H. Rosenvinge, “A Comparison of Open-ended and Closed Questions in the Prediction

of Mental Health,” Qual. Quant., vol. 47, no. 3, pp. 1397–1411, 2013.

[54] P. M. Symonds, “On the Loss of Reliability in Ratings due to Coarseness of the Scale,” J. Exp. Psychol.,

vol. 7, no. 6, pp. 456–461, 1924.

[55] D. Peabody, “Two Components in Bipolar Scales: Direction and Extremeness,” Psychol. Rev., vol. 69,

no. 2, pp. 65–73, 1962.

[56] H. Champney and H. Marshall, “Optimal Refinement of the Rating Scale,” J. Appl. Psychol., vol. 23, no.

3, pp. 323–331, 1939.

[57] S. S. Komorita and W. K. Graham, “Number of Scale Points and the Reliability of Scales,” Educ. Psychol.

Meas., vol. 25, no. 4, pp. 987–995, 1965.

[58] N. Birkett, “Selecting the Number of Response Categories for a Likert-type Scale,” J. Am. Stat. Assoc.,

pp. 488–492, 1986.

[59] C. C. Preston and A. M. Colman, “Optimal Number of Response Categories in Rating Scales: Reliability,

Validity, Discriminating Power and Respondent Preferences,” Acta Psychol. (Amst)., vol. 104, no. 1, pp.

1–15, 2000.

[60] J. H. Flaskerud, “Cultural Bias and Likert-Type Scales Revisited,” Issues Ment. Health Nurs., vol. 33, no.

2, pp. 130–132, 2012.

[61] L. A. King, D. W. King, and A. J. Klockars, “Dichotomous and Multipoint Scales Using Bipolar

Adjectives,” Appl. Psychol. Meas., vol. 7, no. 2, pp. 173–180, 1983.

[62] C. Capik and S. Gozum, “Psychometric Features of an Assessment Instrument with Likert and

Dichotomous Response Formats,” Public Health Nurs., vol. 32, no. 1, pp. 81–86, 2015.

[63] Y. S. Chung and Y. C. Chiou, “Willingness-to-pay for a Bus Fare Reform: A Contingent Valuation

Approach with Multiple Bound Dichotomous Choices,” Transp. Res. Part A Policy Pract., vol. 95, pp.

289–304, 2017.

[64] S. A. Useche, V. G. Ortiz, and B. E. Cendales, “Stress-related Psychosocial Factors at Work, Fatigue, and

Risky Driving Behavior in Bus Rapid Transport (BRT) Drivers,” Accid. Anal. Prev., vol. 104, pp. 106–

114, 2017.

[65] Y. Nishihori, J. Yang, R. Ando, and T. Morikawa, “Understanding Social Acceptability of Drivers for the

Diffusion of Autonomous Vehicles in Japan,” J. East. Asia Soc. Transp. Stud., vol. 12, pp. 2102–2116,

P a g e | 100

2017.

[66] P. Bansal, K. M. Kockelman, and A. Singh, “Assessing Public Opinions of and Interest in New Vehicle

Technologies: An Austin Perspective,” Transp. Res. Part C Emerg. Technol., vol. 67, pp. 1–14, 2016.

[67] S. Ramisetty-Mikler and A. Almakadma, “Attitudes and Behaviors Towards Risky Driving Among

Adolescents in Saudi Arabia,” Int. J. Pediatr. Adolesc. Med., vol. 3, no. 2, pp. 55–63, 2016.

[68] R. Shabanpour, S. N. D. Mousavi, N. Golshani, J. Auld, and A. Mohammadian, “Consumer Preferences

of Electric and Automated Vehicles,” in 5th IEEE International Conference on Models and Technologies

for Intelligent Transportation Systems, MT-ITS 2017, 2017, pp. 716–720.

[69] M. Diana, “Measuring the Satisfaction of Multimodal Travelers for Local Transit Services in Different

Urban Contexts,” Transp. Res. Part A Policy Pract., vol. 46, no. 1, pp. 1–11, 2012.

[70] K. Shaaban and R. F. Khalil, “Investigating the Customer Satisfaction of the Bus Service in Qatar,”

Procedia - Soc. Behav. Sci., vol. 104, pp. 865–874, 2013.

[71] B. Schoettle and M. Sivak, “A Survey of Public Opinion About Autonomous and Self-Driving Vehicles

in the US, the UK and Australia,” Michigan, 2014.

[72] B. Schoettle and M. Sivak, “Public Opinion about Self-Driving Vehicles in China, India, Japan, the U.S.,

the U.K. and Australia,” Michigan, 2014.

[73] J. P. Zmud and I. N. Sener, “Towards an Understanding of the Travel Behavior Impact of Autonomous

Vehicles,” Transp. Res. Procedia, vol. 25, pp. 2500–2519, 2017.

[74] I. N. Sener, J. Zmud, and T. Williams, “Measures of Baseline Intent to Use Automated Vehicles: A Case

Study of Texas Cities,” Transp. Res. Part F Traffic Psychol. Behav., vol. 62, pp. 66–77, 2019.

[75] D. M. Sanbonmatsu, D. L. Strayer, Z. Yu, F. Biondi, and J. M. Cooper, “Cognitive Underpinnings of

Beliefs and Confidence in Beliefs About Fully Automated Vehicles,” Transp. Res. Part F Traffic Psychol.

Behav., vol. 55, pp. 114–122, 2018.

[76] W. Qu, Q. Zhang, W. Zhao, K. Zhang, and Y. Ge, “Validation of the Driver Stress Inventory in China:

Relationship with Dangerous Driving Behaviors,” Accid. Anal. Prev., vol. 87, pp. 50–58, 2016.

[77] B. Öz, T. Özkan, and T. Lajunen, “An Investigation of Professional Drivers: Organizational Safety

Climate, Driver Behaviours and Performance,” Transp. Res. Part F Traffic Psychol. Behav., vol. 16, pp.

81–91, 2013.

[78] K. Amponsah-Tawiah and J. Mensah, “The Impact of Safety Climate on Safety Related Driving

Behaviors,” Transp. Res. Part F Traffic Psychol. Behav., vol. 40, pp. 48–55, 2016.

[79] S. Classen, A. L. Nichols, R. McPeek, and J. F. Breiner, “Personality as a Predictor of Driving

Performance: An Exploratory Study,” Transp. Res. Part F Traffic Psychol. Behav., vol. 14, pp. 381–389,

2011.

[80] C. Domarchi, A. Tudela, and A. González, “Effect of Attitudes, Habit and Affective Appraisal on Mode

Choice: An Application to University Workers,” Transportation (Amst)., vol. 35, no. 5, pp. 585–599,

2008.

[81] C.-F. Chen, “Personality, Safety Attitudes and Risky Driving Behaviors-Evidence from Young Taiwanese

Motorcyclists,” Accid. Anal. Prev., vol. 41, no. 5, pp. 963–968, 2009.

[82] R. F. Abenoza, O. Cats, and Y. O. Susilo, “Travel Satisfaction with Public Transport: Determinants, User

Classes, Regional Disparities and Their Evolution,” Transp. Res. Part A Policy Pract., vol. 95, pp. 64–

84, 2017.

[83] M. Mohamed and N. F. Bromfield, “Attitudes, Driving Behavior, and Accident Involvement Among

Young Male Drivers in Saudi Arabia,” Transp. Res. Part F Traffic Psychol. Behav., vol. 47, pp. 59–71,

2017.

[84] M. Milković and M. Štambuk, “To Bike or not to Bike? Application of the Theory of Planned Behavior

in Predicting Bicycle Commuting Among Students in Zagreb,” Psihol. teme, vol. 2, pp. 187–205, 2015.

[85] T. Liljamo, H. Liimatainen, and M. Pöllänen, “Attitudes and Concerns on Automated Vehicles,” Transp.

Res. Part F Traffic Psychol. Behav., vol. 59, no. 2018, pp. 24–44, 2018.

[86] R. Chomeya, “Quality of Psychology Test Between Likert Scale 5 and 6 Points,” J. Soc. Sci., vol. 6, no.

P a g e | 101

3, pp. 399–403, 2010.

[87] S. S. Komorita, “Attitude Content, Intensity and the Neutral Point on a Likert Scale,” J. Soc. Psychol.,

vol. 61, no. 2, pp. 327–334, 1963.

[88] S. Nordhoff, J. de Winter, M. Kyriakidis, B. van Arem, and R. Happee, “Acceptance of Driverless

Vehicles: Results from a Large Cross-National Questionnaire Study,” J. Adv. Transp., vol. 2018, pp. 1–

22, 2018.

[89] G. A. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for

Processing Information,” Psychol. Rev., vol. 63, no. 2, pp. 81–97, 1956.

[90] E. E. Osgood, G. J. Suci, and P. H. Tannenbaum, The Measurement of Meaning. University of Illinois

Press, 1957.

[91] D. G. Morrison, “Regressions with Discrete Dependent Variables: The Effect on R2,” J. Mark. Res., vol.

9, no. 3, pp. 338–340, 1972.

[92] J. O. Ramsay, “The Effect of Number of Categories in Rating Scales on Precision of Estimation of Scale

Values,” Psychometrika, vol. 38, no. 4, pp. 513–532, 1973.

[93] K. Finstad, “Response Interpolation and Scale Sensitivity: Evidence Against 5-Point Scales,” J. Usability

Stud., vol. 5, no. 3, pp. 104–110, 2010.

[94] S. O. Leung, “A Comparison of Psychometric Properties and Normality in 4-, 5-, 6-, and 11-Point Likert

Scales,” J. Soc. Serv. Res., vol. 37, no. 4, pp. 412–421, 2011.

[95] L. M. Hulse, H. Xie, and E. R. Galea, “Perceptions of Autonomous Vehicles: Relationships with Road

Users, Risk, Gender and Age,” Saf. Sci., vol. 102, pp. 1–13, 2018.

[96] L. Buckley, S. A. Kaye, and A. K. Pradhan, “Psychosocial Factors Associated with Intended Use of

Automated Vehicles: A Simulated Driving Study,” Accid. Anal. Prev., vol. 115, pp. 202–208, 2018.

[97] A. W. Bendig, “The Reliability of Self-Ratings as a Function of the Amount of Verbal Anchoring and of

the Number of Categories on the Scale,” J. Appl. Psychol., vol. 37, no. 1, pp. 38–41, 1953.

[98] A. Mouwen, “Drivers of Customer Satisfaction with Public Transport Services,” Transp. Res. Part A

Policy Pract., vol. 78, pp. 1–20, 2015.

[99] B. Öz, T. Özkan, and T. Lajunen, “Professional and Non-Professional Drivers’ Stress Reactions and Risky

Driving,” Transp. Res. Part F Traffic Psychol. Behav., vol. 13, no. 1, pp. 32–40, 2010.

[100] A. W. Bendig, “Reliability and the Number of Rating-Scale Categories,” J. Appl. Psychol., vol. 38, no. 1,

pp. 38–40, 1954.

[101] A. W. Bendig and J. B. Hughes II, “Effect of Amount of Verbal Anchoring and Number of Rating-Scale

Categories Upon Transmitted Information,” J. Exp. Psychol., vol. 46, no. 2, pp. 87–90, 1953.

[102] R. W. Lissitz and S. B. Green, “Effect of the Number of Scale Points on Reliability: A Monte Carlo

Approach,” J. Appl. Psychol., vol. 60, no. 1, pp. 10–13, 1975.

[103] M. S. Matell and J. Jacoby, “Is There an Optimal Number of Alternatives for Likert Scale Items? Study

1: Reliability and Validity,” Educ. Psychol. Meas., vol. 31, no. 3, pp. 657–674, 1971.

[104] M. S. Matell and J. Jacoby, “Is There an Optimal Number of Alternatives for Likert-Scale Items? Effects

of Testing Time and Scale Properties,” J. Appl. Psychol., vol. 56, no. 6, pp. 506–509, 1972.

[105] R. Garland, “The Mid-Point on a Rating Scale: Is it Desirable?,” Mark. Bull., vol. 2, pp. 66–70, 1991.

[106] K. A. Lormore and S. D. G. Stephens, “Use of the Open-ended Questionnaire with Patients and Their

Significant Others,” Br. J. Audiol., vol. 28, no. 2, pp. 81–89, 1994.

[107] V. M. Esses and G. R. Maio, “Expanding the Assessment of Attitude Components and Structure: The

Benefits of Open-Ended Measures,” Eur. Rev. Soc. Psychol., vol. 12, no. 1, pp. 71–101, 2005.

[108] M. Galesic, R. Tourangeau, M. P. Couper, and F. G. Conrad, “Eye-tracking Data: New Insights on

Response Order Effects and Other Cognitive Shortcuts in Survey Responding,” Public Opin. Q., vol. 72,

no. 5, pp. 892–913, 2008.

[109] C. Lammgård and D. Andersson, “Environmental Considerations and Trade-offs in Purchasing of

Transportation Dervices,” Res. Transp. Bus. Manag., vol. 10, pp. 45–52, 2014.

P a g e | 102

[110] J. A. Nelson, R. M. Bustamante, E. D. Wilson, and A. J. Onwuegbuzie, “The School-Wide Cultural

Competence Observation Checklist for School Counselors: An Exploratory Factor Analysis,” Prof. Sch.

Couns., vol. 11, no. 4, pp. 207–217, 2008.

[111] T. Levett-Jones et al., “The Development and Psychometric Testing of the Satisfaction with Simulation

Experience Scale,” Nurse Educ. Today, vol. 31, no. 7, pp. 705–710, 2011.

[112] B. Renault, J. Agumba, and N. Ansary, “An Exploratory Factor Analysis of Risk Management Practices:

A Study Among Small and Medium Contractors in Gauteng,” Acta Structilia, vol. 25, no. 1, pp. 1–39,

2018.

[113] J. Sun, A. E. Adegbosin, V. Reher, G. Rehbein, and J. Evans, “Validity and Reliability of a Self-

assessment Scale for Dental and Oral Health Student’s Perception of Transferable Skills in Australia,”

Eur. J. Dent. Educ., vol. 24, no. 1, pp. 42–52, 2020.

[114] J. Bláfoss Ingvardson, S. Kaplan, J. de Abreu e Silva, F. di Ciommo, Y. Shiftan, and O. A. Nielsen,

“Existence, Relatedness and Growth Needs as Mediators Between Mode Choice and Travel Satisfaction:

Evidence from Denmark,” Transportation (Amst)., vol. 47, no. 1, pp. 337–358, 2020.

[115] R. Zhang, “Understanding Customers’ Attitude and Intention to Use Driverless Cars,” Northumbria

University, Newcastle, 2019.

[116] R. Dubey, A. Gunasekaran, S. J. Childe, S. Fosso Wamba, and T. Papadopoulos, “Enablers of Six Sigma:

Contextual Framework and its Empirical Validation,” Total Qual. Manag. Bus. Excell., vol. 27, no. 11–

12, pp. 1346–1372, 2016.

[117] A. Cyrus and P. S. Nyakomitta, “Adoption of Computer based Model for Monitoring Parking Revenue

Inflow,” SIJ Trans. Comput. Sci. Eng. its Appl., vol. 2, no. 5, pp. 195–201, 2014.

[118] B. Elizabeth, N. A. Busch-Rossnagel, and K. F. Geisinger, “Development and Preliminary Validation of

the Ego Identity Process Questionnaire,” J. Adolesc., vol. 18, pp. 179–192, 1995.

[119] U. Reja, K. L. Manfreda, H. Valentina, and V. Vehovar, “Open-ended vs. Close-ended Questions in Web

Questionnaires,” Dev. Appl. Stat., vol. 19, pp. 159–177, 2003.

[120] M. Pullman, K. McGuire, and C. Cleveland, “Let Me Count the Words- Quantifying Open-Ended

Interactions with Guests,” Cornell Hosp. Q., vol. 46, no. 3, pp. 323–343, 2005.

[121] D. E. RePass, “Issue Salience and Party Choice,” Am. Polit. Sci. Rev., vol. 65, no. 2, pp. 389–400, 1971.

[122] R. Likert, “The Polls: Straw Votes or Scientific Instruments,” Am. Psychol., vol. 3, no. 12, pp. 556–557,

1948.

[123] H. Schuman, “The Random Probe: A Technique for Evaluating the Validity of Closed Questions,” Am.

Sociol. Rev., vol. 31, no. 2, pp. 218–222, 1966.

[124] L. E. Griffith, D. J. Cook, G. H. Guyatt, and C. A. Charles, “Comparison of Open and Closed

Questionnaire Formats in Obtaining Demographic Information From Canadian General Internists,” J.

Clin. Epidemiol., vol. 52, no. 10, pp. 997–1005, 1999.

[125] J. E. Stanga and J. F. Sheffield, “The Myth of Zero Partisanship: Attitudes toward American Political

Parties, 1964-84,” Am. J. Pol. Sci., vol. 31, no. 4, pp. 829–855, 1987.

[126] P. Gendall, H. Menelaou, and M. Brennan, “Open-ended Questions: Some Implications for Mail Survey

Research,” Mark. Bull., vol. 7, pp. 1–8, 1996.

[127] M. P. Couper, M. W. Traugott, and M. J. Lamias, “Web Survey Design and Administration,” Public Opin.

Q., vol. 65, no. 2, pp. 230–253, 2002.

[128] J. D. Smyth, D. A. Dillman, L. M. Christian, and M. Mcbride, “Open-ended Questions in Web Surveys

Can Increasing the Size of Answer Boxes and Providing Extra Verbal Instructions Improve Response

Quality?,” Public Opin. Q., vol. 73, no. 2, pp. 325–337, 2009.

[129] J. L. Holland and L. M. Christian, “The Influence of Topic Interest and Interactive Probing on Responses

to Open-ended Questions in Web Surveys,” Soc. Sci. Comput. Rev., vol. 27, no. 2, pp. 196–212, 2009.

[130] K. Schmidt, T. Gummer, and J. Roßmann, “Effects of Respondent and Survey Characteristics on the

Response Quality of an Open-ended Attitude Question in Web Surveys,” Methods, Data, Anal., vol. 14,

no. 1, pp. 3–34, 2020.

P a g e | 103

[131] A. Peytchev, “Survey Breakoff,” Public Opin. Q., vol. 73, no. 1, pp. 74–97, 2009.

[132] C. Zuell, N. Menold, and S. Körber, “The Influence of the Answer Box Size on Item Nonresponse to

Open-Ended Questions in a Web Survey,” Soc. Sci. Comput. Rev., vol. 33, no. 1, pp. 115–122, 2015.

[133] S. D. Crawford and M. J. Lamias, “Web Surveys Perceptions of Burden,” Soc. Sci. Comput. Rev., vol. 19,

no. 2, pp. 146–162, 2001.

[134] A. Mavletova, “Data Quality in PC and Mobile Web Surveys,” Soc. Sci. Comput. Rev., vol. 31, no. 6, pp.

725–743, 2013.

[135] S. Schlosser and A. Mays, “Mobile and Dirty: Does Using Mobile Devices Affect the Data Quality and

the Response Process of Online Surveys?,” Soc. Sci. Comput. Rev., vol. 36, no. 2, pp. 212–230, 2018.

[136] M. Revilla and C. Ochoa, “Open Narrative Questions in PC and Smartphones: Is the Device Playing a

Role?,” Qual. Quant., vol. 50, no. 6, pp. 2495–2513, 2016.

[137] S. H. Lo, G. J. P. van Breukelen, G.-J. Y. Peters, and G. Kok, “Pro-environmental Travel Behavior Among

Office Workers: A Qualitative Study of Individual and Organizational Determinants,” Transp. Res. Part

A Policy Pract., vol. 56, pp. 11–22, 2013.

[138] F. M. Leiva, F. J. M. Ríos, and T. L. Martínez, “Assessment of Interjudge Reliability in the Open-ended

Questions Coding Process,” Qual. Quant., vol. 40, no. 4, pp. 519–537, 2006.

[139] L. J. Barcham and S. D. G. Stephens, “The Use of an Open-ended Problems Questionnaire in Auditory

Rehabilitation,” Br. J. Audiol., vol. 14, no. 2, pp. 49–54, 1980.

[140] R. Artstein and M. Poesio, “Inter-Coder Agreement for Computational Linguistics,” Comput. Linguist.,

vol. 34, no. 4, pp. 555–596, 2009.

[141] T. Niedomysl and B. Malmberg, “Do Open-ended Survey Questions on Migration Motives Create Coder

Variability Problems?,” Popul. Space Place, vol. 15, pp. 79–87, 2009.

[142] C. L. Covell, S. Sidani, and J. A. Ritchie, “Does the Sequence of Data Collection Influence Participants’

Responses to Closed and Open-ended Questions? A Methodological Study,” Int. J. Nurs. Stud., vol. 49,

no. 6, pp. 664–671, 2012.

[143] A. M. Falthzik and S. J. Carroll Jr, “Rate of Return for Closed Versus Open-ended Questions in a Mail

Questionnaire Survey of Industrial Organizations,” Psychol. Rep., vol. 29, no. 3, pp. 1121–1122, 1971.

[144] K. A. Jehn, “A Multimethod Examination of the Benefits and Detriments of Intragroup Conflict,” Adm.

Sci. Q., vol. 40, no. 2, pp. 256–282, 1995.

[145] K. M. Jackson and W. M. K. Trochim, “Concept Mapping as an Alternative Approach for the Analysis of

Open-Ended Survey Responses,” Organ. Res. Methods, vol. 5, no. 4, pp. 307–336, 2002.

[146] M. Schmidt, “Quantification of Transcripts from Depth Interviews, Open-ended Responses and Focus

Groups: Challenges, Accomplishments, New Applications and Perspectives for Market Research,” Int. J.

Mark. Res., vol. 52, no. 4, pp. 1–23, 2010.

[147] M. R. Jacobson, C. E. Whyte, and T. Azzam, “Using Crowdsourcing to Code Open-Ended Responses: A

Mixed Methods Approach,” Am. J. Eval., vol. 39, no. 3, pp. 413–429, 2018.

[148] K. Benoit, D. Conway, B. E. Lauderdale, M. Laver, and S. Mikhaylov, “Crowd-sourced Text Analysis:

Reproducible and Agile Production of Political Data,” Am. Polit. Sci. Rev., vol. 110, no. 2, pp. 278–295,

2016.

[149] H. F. Hsieh and S. E. Shannon, “Three Approaches to Qualitative Content Analysis,” Qual. Health Res.,

vol. 15, no. 9, pp. 1277–1288, 2005.

[150] C. P. Health, “Content Analysis,” 2019. [Online]. Available:

https://www.publichealth.columbia.edu/research/population-health-methods/content-analysis.

[151] D. Kawashima and K. Kawano, “Meanings of Loss Among Japanese Suicide Bereaved: Content Analysis

of Open-Ended Responses,” Jpn. Psychol. Res., no. July 2017, pp. 1–9, 2020.

[152] F. C. Mamali, C. M. Lehane, W. Wittich, N. Martiniello, and J. Dammeyer, “What Couples Say About

Living and Coping With Sensory Loss: A Qualitative Analysis of Open-ended Survey Responses,”

Disabil. Rehabil., vol. 0, no. 0, pp. 1–22, 2020.

[153] G. Kelly, M. McKnight, and D. Schubotz, “Is there Anything Else You’d Like to Say About Community

P a g e | 104

Relations?’ Thematic Time Series Analysis of Open-ended Questions From an Annual Survey of 16- Year

Olds,” Methods, Data, Anal., vol. 14, no. 1, pp. 91–126, 2020.

[154] M. A. Burg et al., “Current Unmet needs of Cancer Survivors: Analysis of Open-ended Responses to the

American Cancer Society Study of Cancer Survivors II,” Cancer, vol. 121, no. 4, pp. 623–630, 2015.

[155] M. Savic, R. P. Ogeil, M. J. Sechtig, P. Lee-Tobin, N. Ferguson, and D. I. Lubman, “How Do Nurses

Cope with Shift Work? A Qualitative Analysis of Open-ended Responses from a Survey of Nurses,” Int.

J. Environ. Res. Public Health, vol. 16, no. 20, 2019.

[156] K. W. Mossholder, R. P. Settoon, S. G. Harris, and A. A. Armenakis, “Measuring Emotion in Open-ended

Survey Responses: An Application of Textual Data Analysis,” J. Manage., vol. 21, no. 2, pp. 335–355,

1995.

[157] F. ten Kleij and P. A. D. Musters, “Text Analysis of Open-ended Survey Responses: A Complementary

Method to Preference Mapping,” Food Qual. Prefer., vol. 14, no. 1, pp. 43–52, 2003.

[158] M. E. Roberts et al., “Structural Topic Models for Open-ended Survey Responses,” Am. J. Pol. Sci., vol.

58, no. 4, pp. 1064–1082, 2014.

[159] J. D. Lee and K. Kolodge, “Exploring Trust in Self-Driving Vehicles Through Text Analysis,” Hum.

Factors, vol. 62, no. 2, pp. 260–277, 2020.

[160] T. Hynninen, A. Knutas, and M. Hujala, “Sentiment Analysis of Open-ended Student Feedback,” in 2020

43rd International Convention on Information, Communication and Electronic Technology (MIPRO),

2020, pp. 755–759.

[161] L. Moták et al., “Antecedent Variables of Intentions to Use an Autonomous shuttle: Moving Beyond TAM

and TPB,” Eur. Rev. Appl. Psychol., vol. 67, no. 5, pp. 269–278, 2017.

[162] S. Koul and A. Eydgahi, “The Impact of Social Influence, Technophobia, and Perceived Safety on

Autonomous Vehicle Technology Adoption,” Period. Polytech. Transp. Eng., pp. 1–10, 2019.

[163] H.-K. Chen and D.-W. Yan, “Interrelationships Between Influential Factors and Behavioral Intention with

Regard to Autonomous Vehicles,” Int. J. Sustain. Transp., vol. 13, no. 7, pp. 511–527, 2019.

[164] P. Jing, H. Huang, B. Ran, F. Zhan, and Y. Shi, “Exploring the Factors Affecting Mode Choice Intention

of Autonomous Vehicle Based on an Extended Theory of Planned Behavior-A Case Study in China,”

Sustainability, vol. 11, no. 4, pp. 1–20, 2019.

[165] M. M. Rahman, M. F. Lesch, W. J. Horrey, and L. Strawderman, “Assessing the Utility of TAM, TPB,

and UTAUT for Advanced Driver Assistance Systems,” Accid. Anal. Prev., vol. 108, pp. 361–373, 2017.

[166] Z. Xu, K. Zhang, H. Min, Z. Wang, X. Zhao, and P. Liu, “What Drives People to Accept Automated

Vehicles? Findings from a Field Experiment,” Transp. Res. Part C Emerg. Technol., vol. 95, pp. 320–

334, 2018.

[167] J. K. Choi and Y. G. Ji, “Investigating the Importance of Trust on Adopting an Autonomous Vehicle,”

Int. J. Hum. Comput. Interact., vol. 31, no. 10, pp. 692–702, 2015.

[168] T. Zhang, D. Tao, X. Qu, X. Zhang, R. Lin, and W. Zhang, “The Roles of Initial Trust and Perceived Risk

in Public’s Acceptance of Automated Vehicles,” Transp. Res. Part C Emerg. Technol., vol. 98, no. June

2018, pp. 207–220, 2019.

[169] I. Panagiotopoulos and G. Dimitrakopoulos, “An Empirical Investigation on Consumers’ Intentions

Towards Autonomous Driving,” Transp. Res. Part C Emerg. Technol., vol. 95, pp. 773–784, 2018.

[170] J. Wu, H. Liao, J.-W. Wang, and T. Chen, “The Role of Environmental Concern in the Public Acceptance

of Autonomous Electric Vehicles: A Survey from China,” Transp. Res. Part F Traffic Psychol. Behav.,

vol. 60, pp. 37–46, 2019.

[171] P. Böhm, M. Kocur, M. Firat, and D. Isemann, “Which Factors Influence Attitudes Towards Using

Autonomous Vehicles?,” in Adjunct Proceedings of the 9th International ACM Conference on Automotive

User Interfaces and Interactive Vehicular Applications (AutomotiveUI ’17), 2017, pp. 141–145.

[172] T. Leicht, A. Chtourou, and K. Ben Youssef, “Consumer Innovativeness and Intentioned Autonomous

Car Adoption,” J. High Technol. Manag. Res., vol. 29, no. 1, pp. 1–11, 2018.

[173] R. Madigan, T. Louw, M. Wilbrink, A. Schieben, and N. Merat, “What Influences the Decision to Use

Automated Public Transport? Using UTAUT to Understand Public Acceptance of Automated Road

P a g e | 105

Transport Systems,” Transp. Res. Part F Traffic Psychol. Behav., vol. 50, pp. 55–64, 2017.

[174] H. Woltman, A. Feldstain, J. C. MacKay, and M. Rocchi, “An Introduction to Hierarchical Linear

Modelling,” Tutor. Quant. Methods Psychol., vol. 8, no. 1, pp. 52–69, 2012.

[175] J. B. Ullman and P. M. Bentler, “Structural Equation Modeling,” in Handbook of Psychology, 2nd ed., I.

B. Weiner, Ed. Wiley Online Library, 2012, pp. 419–443.

[176] K. E. Train, “Mixed Logit,” in Discrete Choice Methods with Simulation, 2nd ed., New York: Cambridge

University Press, 2009, pp. 134–150.

[177] P. S. Lavieri, V. M. Garikapati, C. R. Bhat, and R. M. Pendyala, “An Investigation of Heterogeneity in

Vehicle Ownership and Usage for the Millennial Generation,” Transp. Res. Rec. J. Transp. Res. Board,

vol. 2664, pp. 91–99, 2017.

[178] P. Liu, Q. Guo, F. Ren, L. Wang, and Z. Xu, “Willingness to Pay for Self-Driving Vehicles: Influences

of Demographic and Psychological Factors,” Transp. Res. Part C Emerg. Technol., vol. 100, pp. 306–

317, 2019.

[179] P. S. Lavieri, V. M. Garikapati, C. R. Bhat, R. M. Pendyala, S. Astroza, and F. F. Dias, “Modeling

Individual Preferences for Ownership and Sharing of Autonomous Vehicle Technologies,” Transp. Res.

Rec. J. Transp. Res. Board, vol. 2665, pp. 1–10, 2017.

[180] J. Piao, M. McDonald, N. Hounsell, M. Graindorge, T. Graindorge, and N. Malhene, “Public Views

towards Implementation of Automated Vehicles in Urban Areas,” Transp. Res. Procedia, vol. 14, no. 0,

pp. 2168–2177, 2016.

[181] K. Kaur and G. Rampersad, “Trust in Driverless Cars: Investigating Key Factors Influencing the Adoption

of Driverless Cars,” J. Eng. Technol. Manag., vol. 48, pp. 87–96, 2018.

[182] R. A. Daziano, M. Sarrias, and B. Leard, “Are Consumers Willing to Pay to Let Cars Drive for Them?

Analyzing Response to Autonomous Vehicles,” Transp. Res. Part C Emerg. Technol., vol. 78, pp. 150–

164, 2017.

[183] W. Zhang, T. Yoshida, and X. Tang, “A Comparative Study of TF*IDF, LSI and Multi-words for Text

Classification,” Expert Syst. Appl., vol. 38, no. 3, pp. 2758–2765, 2011.

[184] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp.

993–1022, 2003.

[185] R. Prabowo and M. Thelwall, “Sentiment Analysis: A Combined Approach,” J. Informetr., vol. 3, no. 2,

pp. 143–157, 2009.

[186] Y. Liu, Z. Liu, T. Chua, and M. Sun, “Topical Word Embeddings,” in Proceedings of the 29th AAAI

Conference on Artificial Intelligence (AAAI’15), 2015, pp. 2418–2424.

[187] A. Lavelli, F. Sebastiani, and R. Zanoli, “Distributional Term Representations: An Experimental

Comparison,” in 13th ACM International Conference on Information and Knowledge Management, 2004,

pp. 615–624.

[188] J. D. Mcauliffe and D. M. Blei, “Supervised Topic Models,” in Advances in Neural Information

Processing Systems, 2008, pp. 1–8.

[189] D. T. Vo and C. Y. Ock, “Learning to Classify Short Text from Scientific Documents using Topic Models

with Various Types of Knowledge,” Expert Syst. Appl., vol. 42, no. 3, pp. 1684–1698, 2015.

[190] U. Verma, “Text Preprocessing for NLP (Natural Language Processing), Beginners to Master,” Analytics

Vidhya, 2020. [Online]. Available: https://medium.com/analytics-vidhya/text-preprocessing-for-nlp-

natural-language-processing-beginners-to-master-fd82dfecf95.

[191] J. Weng, “NLP Text Preprocessing: A Practical Guide and Template,” Towards Data Science, 2019.

[Online]. Available: https://towardsdatascience.com/nlp-text-preprocessing-a-practical-guide-and-

template-d80874676e79.

[192] T. Singh and M. Kumari, “Role of Text Pre-Processing in Twitter Sentiment Analysis,” in Procedia -

Procedia Computer Science, 2016, vol. 89, pp. 549–554.

[193] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,” in Procedia

Computer Science, 2013, vol. 17, pp. 26–32.

P a g e | 106

[194] I. Peled, F. Rodrigues, and F. C. Pereira, “Model-Based Machine Learning for Transportation,” in

Mobility Patterns, Big Data and Transport Analytics: Tools and Applications for Modeling, C. Antoniou,

L. Dimitriou, and F. C. Pereira, Eds. Elsevier, 2019, pp. 145–171.

[195] B. Mabey, “pyLDAvis,” 2014. [Online]. Available: https://pyldavis.readthedocs.io/en/latest/index.html.

[196] Z. Zhao, H. N. Koutsopoulos, and J. Zhao, “Discovering Latent Activity Patterns from Transit Smart Card

Data: A Spatiotemporal Topic Model,” Transp. Res. Part C Emerg. Technol., vol. 116, no. July 2019, p.

102627, 2020.

[197] F. C. Pereira, F. Rodrigues, E. Polisciuc, and M. Ben-akiva, “Why So Many People? Explaining

Nonhabitual Transport Overcrowding With Internet Data,” IEEE Trans. Intell. Transp. Syst., vol. 16, no.

3, pp. 1–10, 2015.

[198] I. Markou, F. Rodrigues, and F. C. Pereira, “Is Travel Demand Actually Deep? An Application in Event

Areas Using Semantic Information,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 2, pp. 641–652, 2020.

[199] T. Kurashima, T. Iwata, G. Irie, and K. Fujimura, “Travel Route Recommendation using Geotagged

Photos,” Knowl. Inf. Syst., vol. 37, no. 1, pp. 37–60, 2013.

[200] Z. Xu, L. Chen, and G. Chen, “Topic-based Context-aware Travel Recommendation Method Exploiting

Geotagged Photos,” Neurocomputing, vol. 155, pp. 99–107, 2015.

[201] S. Hasan and S. V. Ukkusuri, “Urban Activity Pattern Classification Using Topic Models from Online

Geolocation Data,” Transp. Res. Part C Emerg. Technol., vol. 44, pp. 363–381, 2014.

[202] A. S. Pietsch and S. Lessmann, “Topic Modeling for Analyzing Open-ended Survey Responses,” J. Bus.

Anal., vol. 1, no. 2, pp. 93–116, 2018.

[203] E. Tvinnereim and K. Fløttum, “Explaining Topic Prevalence in Answers to Open-ended Survey

Questions about Climate Change,” Nat. Clim. Chang., vol. 5, no. 8, pp. 744–747, 2015.

[204] S. Mitsui, T. Kubo, and Y. Shoji, “Understanding Residents’ Perceptions of Nature and Local Economic

Activities Using an Open-ended Question Before Protected Area Designation in Amami Islands, Japan,”

J. Nat. Conserv., vol. 56, no. May 2019, p. 125857, 2020.

[205] V. Baburajan, J. de Abreu e Silva, and F. C. Pereira, “Open-Ended Versus Closed-Ended Responses: A

Comparison Study Using Topic Modeling and Factor Analysis,” IEEE Trans. Intell. Transp. Syst., pp. 1–

10, 2020.

[206] Qualtrics, “Predicted Duration,” Qualtrics, 2020. [Online]. Available:

https://www.qualtrics.com/support/survey-platform/survey-module/survey-checker/survey-

methodology-compliance-best-practices/#PredictedDuration. [Accessed: 16-Oct-2020].

[207] H. B. Mann and D. R. Whitney, “On a Test of Whether One of Two Random Variables is Stochastically

Larger than the Other,” Ann. Math. Stat., vol. 18, no. 1, pp. 50–60, 1947.

[208] T. Stoiber, I. Schubert, R. Hoerler, and P. Burger, “Will Consumers Prefer Shared and Pooled-use

Autonomous Vehicles? A Stated Choice Experiment with Swiss Households,” Transp. Res. Part D

Transp. Environ., vol. 71, no. December 2018, pp. 265–282, 2019.

[209] V. Baburajan, J. de Abreu e Silva, and F. C. Pereira, “Opening Up the Conversation: Topic Modeling for

Automated Text Analysis in Travel Surveys,” in 2018 21st International Conference on Intelligent

Transportation Systems (ITSC), 2018, pp. 3657–3661.

[210] W. H. Greene and D. A. Hensher, Modelling Ordered Choices: A Primer, 1st ed. New York: Cambridge

University Press, 2010.

[211] J. de Abreu e Silva, C. Papaix, and G. Chen, “The Influence of Information-based Transport Demand

Management Measures on Commuting Mode Choice. Comparing Web vs. Face-to-face Surveys,” Transp.

Res. Procedia, vol. 32, pp. 363–373, 2018.

[212] B. Muthén and T. Asparouhov, “Bayesian Structural Equation Modeling: A More Flexible Representation

of Substantive Theory,” Psychol. Methods, vol. 17, no. 3, pp. 313–335, 2012.

[213] A. E. Gelfand, “Gibbs Sampling,” J. Am. Stat. Assoc., vol. 95, no. 452, pp. 1300–1304, 2000.

[214] P. Henman, “Improving Public Services Using Artificial Intelligence: Possibilities, Pitfalls, Governance,”

Asia Pacific J. Public Adm., vol. 42, no. 4, pp. 209–221, 2020.

P a g e | 107

[215] F. D. Davis, R. P. Bagozzi, and P. R. Warshaw, “User Acceptance of Computer Technology: A

Comparison of Two Theoretical Models,” Manage. Sci., vol. 35, no. 8, pp. 982–1003, 1989.

[216] J. Zmud, I. N. Sener, and J. Wagner, “Consumer Acceptance and Travel Behavior Impacts of Automated

Vehicles,” 2016.

P a g e | 108

P a g e | I

APPENDIX A

6 QUESTIONNAIRE: INTENTION TO USE SHARED AVS (INDIA)

This is a collaborative research program undertake jointly by researchers from the Technical University of

Denmark and Instituto Superior Técnico, University of Lisbon, Portugal to understand the attitudes towards the

use of Autonomous Mobility Services. It is financially supported by European Cooperation in Science &

Technology Cost Action- TU1305.

This questionnaire is completely anonymous, and the data collected here will be used exclusively for research

purposes and will not be given to third parties, other than the researchers involved in this project.

We appreciate and thank you for your participation in this research. Your participation is voluntary, but your

answers are very valuable to us. Please keep in mind that there are no right and wrong answers.

The research team:

Vishnu Baburajan

PhD Student- Instituto Superior Técnico, University of Lisbon, Lisboa, Portugal

Visiting PhD Student- Technical University of Denmark, Copenhagen, Denmark

Prof. João de Abreu e Silva, PhD

Instituto Superior Técnico, University of Lisbon, Lisboa, Portugal

Prof. Francisco Camara Pereira, PhD

Technical University of Denmark, Copenhagen, Denmark

Three randomly selected respondents completing the survey will be given coupons for “Middag for 2”. If you

want to participate in this lottery please provide us with your email address, to deliver the coupons to you.

P a g e | II

1. I am a person who likes to

Strongly

Disagree

Neutral

Agree

Strongly

Agree

To have the latest gadgets

Test new mobile applications

Use phone frequently for online reservations,

payments, etc.

Follow news about Autonomous Vehicles (Google

Car, Tesla, etc.)

2. I use my Smartphone for travel-related needs

Never

Rarely

Sometimes

Often

Frequently

I do not

have a

Smartphone

Consulting maps, to know my location

and get information about routes and

transport modes

Public transportation/ taxi apps

Bike-sharing systems

3. I think Transport is one of the major causes of environmental problems as it …

Strongly

Disagree

Neutral

Agree

Strongly

Agree

Is a major source of pollution

Depends too much on fossil fuels

Requires major infrastructure (roads, rail tracks,

tunnels)

4. I think it is possible to reduce environmental footprint due to transportation, by increasing the

use of

Extremely

Unlikely

Neutral

Likely

Highly Likely

Carpool

Public Transport

Bike

5. I think, environmental problems due to transportation can be solved by technological

advancements in

Extremely

Unlikely

Neutral

Likely

Highly Likely

Alternative fuel (biodiesel,

natural gas, hydrogen)

Electric Vehicles

Electric Bikes

Bikes and Car sharing

6. For travel related needs, I use my smartphone for (Mention at least 2 important uses that you find

relevant)

7. I think Transport is a major cause of the environmental problem because... (Mention at least 2 major

causes that you find relevant)

P a g e | III

8. Autonomous vehicles will impact society by

Extremely

Unlikely

Neutral

Likely

Extremely

Likely

Making travel more environmentally friendly

Reducing traffic congestion in cities

Reducing transportation induced pollution

Making travel safer, by reducing accidents

Making travel easier to people who cannot

otherwise drive

Reducing gender equity issues in travel

Reducing the need for parking spaces

Causing unemployment of existing drivers

Creating new jobs for skilled workers

9. The society will benefit from the use of Autonomous Vehicles as it will... (Mention at least 2 benefits

that you find relevant)

10. Autonomous vehicles are likely to impact the society negatively as... (Mention at least 2 negative

impacts that you find relevant)

P a g e | IV

Strongly

Disagree

Neutral

Agree

Strongly

Agree

11. These questions are related to your use of Autonomous Vehicles (AVs)

I think it will be cool to use AVs

I can involve in other activities during travel

(interact with friends, read books, browse the

internet)

I will be relieved from the stress of driving

I can eliminate parking-related issues (charges,

cruising time, etc.)

I find it undesirable to share the vehicles with

others

I may have to plan my travel, as I may not have

access to vehicles

I think AVs might kill the pleasure of driving

12. I think my friends and family

Will use AVs

Believe, AVs will reduce congestion

Believe, AVs will reduce pollution

Believe, AVs will make travel safer, by reducing

accidents

Encourage the use of public transport

Will be positive about me using AVs

13. Perceptions about the use of AVs,

I may take time learning to use it

I am confident the system will be protected

against hacking and failures

I am confident that the interaction with other

vehicles will be safe

I am worried about the liabilities after an

accident

I have concerns regarding the payment for the

services

I believe, this will make my travel more efficient

(saves time and cost)

I believe, AVs will make travel more

environmentally friendly

I think AVs will not be affordable to me

P a g e | V

Autonomous Vehicles (AV) are often described as a sustainable solution to transportation. AVs are believed to

address issues such as congestion, pollution and address the issue of increasing vehicle ownership.

These driverless buses will be operating along the same bus routes and are expected to improve the efficiency

of public transportation systems.

Individual Characteristics

14. I will use Autonomous Mobility Services for my daily travel needs

Highly

Unlikely

Neutral

Likely

Highly Likely

15. In the last week, I have commuted by (tick all the modes you have used)

Walk only

Bike

Public Transport

Car

Motorbike

Intermediate Public Transport (taxis,

autorickshaw, Uber, etc.)

16. How many minutes do you take to travel between home and university/school/workplace?

17. I have shared rides with others in the past

Daily

2-3 times a week

2-3 times a month

Rarely

Never

18. What is your gender?

Female

Male

Prefer not to answer

19. What is your age?

20. What is your occupation?

Student

Postgraduate

student

Faculty

Manager

Professional

A technician or

associate

professional

Clerical support

worker

Service and sales

worker

Skilled

agricultural,

forestry and

fishery worker

Craft and related

trades worker

Plant and machine

operator and

assembler

Elementary

occupation

Armed forces

occupation

Without an

occupation (e.g.

retired,

unemployed)

Prefer not to

answer

21. What is your household income in INR per month?

0-9,999

10,000- 24,999

25,000-49,999

50,000-74,999

75,000-89,999

More than 90,000

I prefer not to

answer

22. In the last 2 years, have you been involved in a road accident?

Yes

23. If you are available to be contacted in the future to answer surveys related to scientific

research on transportation, please give us your email id.

P a g e | VI

P a g e | VII

APPENDIX B

7 QUESTIONNAIRE: INTENTION TO USE AVS FOR

COMMUTE TRIPS (USA)

To understand the attitudes towards the use of Autonomous Vehicles, this collaborative research is undertaken

by researchers from

• Instituto Superior Técnico, Lisbon, Portugal

• Technical University of Denmark, Copenhagen, Denmark

This questionnaire is completely anonymous and the data collected here will be used exclusively for research

purposes. It will not be shared with third parties, other than with the researchers involved in this project.

We appreciate and thank you for your participation in this research. Your participation is voluntary, but your

answers are very valuable to us. Please keep in mind that there are no right and wrong answers.

The research team:

Vishnu Baburajan

PhD student- Instituto Superior Técnico, Lisbon, Portugal

Guest PhD student- Technical University of Denmark, Copenhagen, Denmark

Prof. João de Abreu e Silva, PhD

Associate Professor, Instituto Superior Técnico, Lisbon, Portugal

Prof. Francisco Camara Pereira, PhD

Professor, Technical University of Denmark, Copenhagen, Denmark

P a g e | VIII

A bit about yourself (Socio-economic characteristics)

1. Gender

Female

Male

Prefer not to answer

2. My age is…

3. Total annual household income before taxes

Less than $10,000

$10,000 to $14,999

$15,000 to $24,999

$25,000 to $34,999

$35,000 to $49,999

$50,000 to $74,999

$75,000 to $99,999

$100,000 to $124,999

$125,000 to $149,999

$150,000 to $199,999

$200,000 or more

I don’t know

I prefer not to answer

4. Educational Qualification

Less than high school

graduate

High school graduate or

GED

Some college or

associate degree

Bachelors degree

Graduate degree or

professional degree

(Masters or PhD)

I don’t know

I prefer not to answer

5. Race

White

Black or African

American

Asian

American Indian or

Alaska Native

Native Hawaiian or

other Pacific Islander

Some other race

I don’t know

I prefer not to answer

6. Employment status

Full-time

Part-time

Student

7. Including yourself, how many adults (18 years and older) live in your household?

1 (you)

8. How many children aged between 8 and 17 live in your household?

9. How many children under age 7 live in your household?

10. In which state do you currently reside?

P a g e | IX

Travel Characteristics

12. Average miles travelled per year (all modes)

Less than 3,000 miles

3,000-6,000 miles

6,000-9,000 miles

9,000-13,000 miles

13,000-15,000 miles

15,000-18,500 miles

Over 18,5000 miles

I don’t know or I

prefer not to answer

13. How often do you take each of these modes of transportation?

Every day

Several times

a week

Several times

a month

Several times

a year

Never

Walk

Bicycle

Motorcycle/Scooter

Car/SUV/Van

Car Sharing

(Zipcar/Car2Go,

etc.)

Taxi/Ride-hailing

(Uber/Lyft, etc.)

Public Transport

(Bus/Train/Subway/

Ferry/Light Rail)

I work from home

(home-based

telecommute)

14. How long is your distance (one-way) to work/school (in miles)? Please round to the nearest number

Distance

I do not know

15. How many minutes is your average morning commute? Please round to the nearest number

Travel time

I work from home (home-based telecommute)

16. How often do you make stops or run errands on your way to work or home or in the middle of the

day?

Daily

3-4 times a week

1-2 times a week

Less than once a week

Never

The following question is shown only to respondents using the car every day or several times a week

17. How much do you spend per day (in $) in parking while you are at work/school? (Please calculate

the cost per day, even if you have a monthly or yearly membership. Also, round to the nearest number)

Parking charges

I have access to a free parking facility

The following question is shown only to respondents not using the car every day or several times a week

18. What is the average parking charge per day (in$) near your work/school? Please round to the nearest

number

Parking Cost

I do not know

The following questions are shown only to respondents using the car every day or several times a week

19. On a typical day in the last week, how many individuals are in the car on your commute to

work/school? (including yourself)

1 (you)

P a g e | X

20. How important to you is the ability to leave items in your car?

Very important

Somewhat important

Not important

Shown to everybody

21. The average cost of a new car today is 35,250 USD. Assuming that it is now time to purchase a new

car, how much are you willing to spend on your next vehicle purchase? Please indicate a value- only

numbers

Vehicle Cost

I do not have plans to buy a car

P a g e | XI

Familiarity with Autonomous Vehicles

22. Some modern cars are equipped with Adaptive Cruise Control (ACC)- a system that can

automatically follow another car.

How often did you use Adaptive Cruise Control when driving in the last 12 months?

Very frequently

Frequently

Occasionally

Rarely

Very rarely

Never

I don’t know about

ACC

I don’t drive a car with

ACC

23. Have you heard of Autonomous Vehicle (Google, Tesla, etc.)?

Yes

24. Ever ridden in a Fully Autonomous Vehicle?

Yes

Shown only to respondents answering Ver_LKOE

25. What are your general opinions about Autonomous Vehicles?

26. Do you believe that Autonomous Vehicles are useful?

Yes

Explain why.

P a g e | XII

Perceptions of Autonomous Vehicles (Shown only to Ver_LK and Ver_LKOE)

You will now be presented with some statements about Autonomous Vehicles. Please indicate how

much you agree with each of these statements.

Strongly

Disagree

Neutral

Agree

Strongly

Agree

27. Learning to use Autonomous Vehicles will be

easy for me

28. I will find it easy to get Autonomous Vehicles to

do what I want them to do

29. It will be easy for me to become skilful at using

Autonomous Vehicles

30. I will find Autonomous Vehicles easy to use

31. Using Autonomous Vehicles will be useful in

meeting my travel needs

32. Autonomous Vehicles will let me do other tasks

such as eating, watching a movie, be on a cell phone

during my trip

33. Using Autonomous Vehicles will decrease my

accident risk

34. Using Autonomous Vehicles will relieve my

stress of driving

35. I find Autonomous Vehicles to be useful when

I’m impaired (e.g. sleepy, under the influence of

alcohol or a controlled substance)

36. I’m worried about the general safety of such

technology

37. I’m worried that the failure or malfunction of

Autonomous Vehicles may cause accidents

38. I’m concerned that Autonomous Vehicles will

collect too much personal information from me

39. I’m concerned that Autonomous Vehicles will

use my personal information for other purposes

without my authorisation

40. I’m concerned that Autonomous Vehicles will

share my personal information with other entities

without my authorisation

41. Autonomous Vehicles are dependable

42. Autonomous Vehicles are reliable

43. Overall, I can trust Autonomous Vehicles

44. Using Autonomous Vehicles is a good idea

45. Using Autonomous Vehicles is a wise idea

46. Using Autonomous Vehicles is pleasant

P a g e | XIII

Shown only to respondents answering Ver_OE of the Questionnaire

47. Do you think that it will be easy to use Autonomous Vehicles?

Yes

Explain why.

48. Do you believe that Autonomous Vehicles are useful?

Yes

Explain why.

49. Do you have safety concerns regarding the use of Autonomous Vehicles?

Yes

Explain why.

50. Do you have concerns related to privacy associated with the use of Autonomous Vehicles?

Yes

Explain why

51. Would you as a user trust an Autonomous Vehicle?

Yes

Explain why

51. What are your general opinions about Autonomous Vehicles?

P a g e | XIV

SP Introduction

In the next section, you will be presented with 3 car based alternatives for your commute trips.

• Regular car- This will belong to your household.

• Private Autonomous Vehicle- This option will be similar to a regular car but will have the ability to

self-drive. This will belong to your household.

• Shared Autonomous Vehicle- This option involves a subscription to a fleet of shared autonomous

vehicles, to which you may have access on-demand. This car will pick you up and drop you off at your

destination without requiring you to look for parking.

This study aims to understand the shift between a regular car and autonomous vehicles. Please choose the

alternative of your preference based on the attributes presented to you. Even if you happen to be a user of non-

motorized modes of transport (bike, walk, etc.) or public transport (bus, train, light rail, ferry, etc.), we ask you

to consider these alternatives and choose the one you prefer.

You will be presented with six different scenarios. For each scenario, please pick the option you would choose.

Given the following characteristics

Regular Car

Private Autonomous

Vehicle

Shared Autonomous

Vehicle

Purchase Cost ($)

Yearly Membership Cost ($)

Trip cost ( per the direction of

commute) $

Daily parking cost ($)

Keep in mind that the time you spend in each vehicle would be the same. Your current vehicle requires

additional time to look for parking and time to walk to the parking, which the autonomous vehicle no longer

necessitates.

Which option would you choose to use for this commute trip?

Regular Car

Private Autonomous

Vehicle

Shared Autonomous

Vehicle

P a g e | XV

APPENDIX C

8 EXPERIMENTAL DESIGN FOR THE INTENTION TO USE

AVS FOR COMMUTE TRIPS

Table 8.1 Statements to Measure the Constructs in the Proposed Model and Their Sources

Constructs

Items

Contents

Sources

Perceived ease

of use (PEoU)

PEOU1

Learning to use autonomous vehicles will be easy for me

Davis, Bagozzi and

Warshaw [215]

PEOU2

It will find it easy to get autonomous vehicles to do what I

want it to do

PEOU3

It will be easy for me to become skilful at using

autonomous vehicles

PEOU4

I will find autonomous vehicles easy to use

Perceived

usefulness

(PU)

PU1

Using autonomous vehicles will be useful in meeting my

driving needs

Davis, Bagozzi and

Warshaw [215]

PU2

Autonomous vehicles will let me do other tasks, such as

eating, watch a movie, be on a cell phone on my trip

PU3

Using autonomous vehicles will decrease my accident risk

PU4

Using autonomous vehicles will relieve my stress of

driving

PU5

I find autonomous vehicles to be useful when I’m impaired

(e.g. drowsy, drunk, drugs)

Perceived

safety risk

(PSR)

PSR1

I’m worried about the general safety of such technology

Zmud, Sener and

Wagner [216]

PSR2

I’m worried that the failure or malfunctions of autonomous

vehicles may cause accidents

Perceived

privacy risk

(PPR)

PPR1

I am concerned that autonomous vehicles will collect too

much personal information from me

Kyriakidis, Happee

and de Winter [39]

PPR2

I am concerned that autonomous vehicles will use my

personal information for other purposes without my

authorisation

PPR3

I am concerned that autonomous vehicles will share my

personal information with other entities without my

authorisation

Trust

Trust1

Autonomous vehicles are dependable

Choi and Ji [167]

Trust2

Autonomous vehicles are reliable

Trust3

Overall, I can trust autonomous vehicles

Attitude

(ATT)

ATT1

Using autonomous vehicles is a good idea

Davis, Bagozzi and

Warshaw [215]

ATT2

Using autonomous vehicles is a wise idea

ATT3

Using autonomous vehicles is pleasant

P a g e | XVI

Table 8.2 Orthogonal Scenarios (Source: Haboucha, Ishaq and Shiftan[40])

Scenar

Variable 1

(Purchase price

PAV) [in %]

Variable 2

(Subscription

cost SAV) [in $]

Variable 3

(Trip Cost

PAV) [in %]

Variable 4

(Trip Cost

SAV) [in %]

Variable 5

(Parking Cost

PAV) [in %]

100

$2000

100

120

300

100

$150

210

100

$300

100

150

115

$2000

120

210

115

150

115

$150

100

115

$300

300

$2000

150

100

210

$150

300

$300

120

130

$2000

100

300

130

$150

120

150

130

$150

210

100

17#

130

$150

210

18#

100

$300

210

19#

$2000

100

150

* yearly cost of membership # additional scenarios generated

P a g e | XVII

APPENDIX D

9 RESULTS OF TOPIC MODEL (USA)

In this research, six open-ended questions were presented to respondents answering Ver_OE

of the questionnaire. Along with the first five open-ended questions, we presented respondents

with an option to agree/disagree with the statements. In the estimation of sLDA, we used

responses to the agree/disagree statement as the response variable (results in Table 9.1).

We extracted four topics from OE1; the first extracted topic (To_S11) was primarily about

the easiness of getting it to work, learn and gain trust. The second topic (To_S12) discussed

the lack of control that makes it unsafe and difficult to trust. Finally, To_S13 covers the

easiness in operation, and To_S14 covers additional benefits from the self-navigation and the

easiness.

We extracted six broad themes sLDA from the responses to the perceived usefulness of AVs.

First, respondents believed that AVs might save travel time and make travel more

environmentally friendly (To_S21). Second, on the ability to work during travel, participants

shared contrasting views. Respondents believed that AVs might facilitate working during

travel (To_S22); it may, however, demand additional attention, which may negatively affect

their work (To_S26). Third, AVs might make travelling safer, mitigate congestion (To_S23),

and ensure mobility for the disabled (To_S24). Finally, many participants emphasised the

need for human control while using AVs (To_S25).

The next open-ended question evaluated the safety concerns associated with the use of AVs.

The safety concerns stemming from the lack of control is a significant concern (To_S31).

Many argue that lack of control causes accidents (To_S33) or due to malfunctions (To_S34)

or sensor fails (To_S35). Furthermore, as humans are error-prone, many believe that there

could be flaws in the software programs (To_S36) and emphasise the need for thorough

testing of AVs before their widespread deployment (To_S32).

We then evaluated if individuals had privacy concerns related to the use of AVs. Many shared

no privacy concerns as they opined that it was unnecessary if they are transparent (To_S41).

Another argument was that the information is already in the public domain (To_S44) through

various platforms. Some opined that they do have concerns, but it was not something that they

should be bothered about (To_S43). Furthermore, it would not be a concern if users are

P a g e | XVIII

informed about data collection and storage (To_S46) and handling the information securely

(To_S46). Regarding some of the concerns, they were mostly related to hacking (To_S42).

Table 9.1 Top 5 Words for Each Topic for Open-ended Questions

Word_1

Word_2

Word_3

Word_4

Word_5

OE1- Do you think that it will be easy to use Autonomous Vehicles

To_S11

use

easi

technolog

make

learn

To_S12

control

time

feel

hard

trust

To_S13

drive

driver

get

easier

need

To_S14

everyth

assum

comput

destin

OE2- Do you believe that Autonomous Vehicles are useful?

To_S21

use

time

environ

better

save

To_S22

drive

driver

get

work

commut

To_S23

accid

driver

traffic

reduc

help

To_S24

drive

use

help

need

abl

To_S25

accid

use

control

human

issu

To_S26

make

abl

attent

pay

OE3- Do you have safety concerns regarding the use of Autonomous Vehicles?

To_S31

drive

concern

safeti

control

driver

To_S32

technolog

safe

use

work

trust

To_S33

comput

happen

malfunct

system

alway

To_S34

accid

caus

malfunct

worri

road

To_S35

stop

road

abl

need

sensor

To_S36

human

driver

drive

error

make

OE4- Do you have concerns related to privacy associated with the use of Autonomous Vehicles?

To_S41

make

question

noth

abl

safe

To_S42

hack

someon

technolog

system

hacker

To_S43

privaci2

concern

issu

sure

use

To_S44

alreadi

track

use

differ

everyth

To_S45

inform

need

info

secur

To_S46

inform

data

person

compani

collect

OE5- Would you as a user trust an Autonomous Vehicle?

To_S51

trust

make

abl

safe

sure

To_S52

trust

technolog

use

time

To_S53

safeti

comput

trust

concern

malfunct

To_S54

drive

driver

human

better

thing

To_S55

control

drive

feel

technolog

enough

In the fifth open-ended question (OE5), we asked respondents if they would trust AVs. A

significant proportion of respondents were not yet ready to trust AVs. These trust issues could

be related to further testing (To_S51) and the potential safety concerns due to malfunctions

privacy and concern can be combined; they however did not appear in the same sequence in

a sentence and hence was not combined

P a g e | XIX

(To_S53). Sceptics argued that humans could drive better (To_S54) and that computer cannot

be trusted (To_S55). Probably over time, more users might start trusting the system (To_S52).

P a g e | XX

P a g e | XXI

APPENDIX E

10 ESTIMATION RESULTS FOR INTENTION TO USE AVS FOR

COMMUTE TRIPS

For the estimation, we consider “Regular Car” as the base alternative and present the

estimated coefficients for “Private AVs” (PAV) and “Shared AVs” (SAV). In the subsequent

paragraphs, we present a discussion on the effects different variables have on the choices.

Finally, in the following discussion, we provide the names of the models in square brackets

that align with the findings from this research.

Socio-demographic characteristics- In this study, we explored the influence of socio-

demographic characteristics on the choice of mode for commute trips (coefficients in Table

10.1). Male respondents answering the survey are more likely to use AVs [Prop, Ver_LKOE,

Ver_OE] as observed by Payre, Cestac and Delhomme [38] (opposite effect for shared AV for

Ver_LK). Younger individuals are more likely to use AVs [Prop, Ver_LKOE, Ver_OE]

(similar to Nielsen and Haustein [42], but the opposite effect for shared AV for Ver_LK).

Higher-income respondents are more likely to use private AVs [Prop, Ver_OE, Ver_LKOE,

Ver_OE] (similar observations made by Bansal and Kockelman [16]), and lower-income

respondents are more likely to use Shared AVs [Prop, Ver_LKOE] [38] (opposite effect for

Ver_LK). Individuals with higher educational qualifications are likely to use AVs [Prop,

Ver_LK, Ver_OE] (similar observations made by Haboucha, Ishaq and Shiftan [40], but the

opposite effect private AV for Ver_LKOE). White Americans are less likely to use Private

AVs [Prop, Ver_LKOE, Ver_OE] and African Americans are less likely to use Private and

Shared AVs [Prop, Ver_LKOE, Ver_OE] (opposite effect for Ver_LK). Compared to

employed individuals, students are more likely to use both variants of AVs [Prop, Ver_LKOE,

Ver_OE] (opposite effect for shared AV for Ver_LK). Individuals from families with more

adults are less likely to use AVs [Prop, Ver_LKOE] (opposite effect for Ver_LK), so does

individuals from families with kids of age less than 8 [Prop, Ver_LK, Ver_OE] (opposite

effect for Private AV for Ver_LKOE), but those from families with more kids aged between 8

and 17 are more likely-to-use AVs [Prop, Ver_LK, Ver_OE] (opposite effect for Ver_LKOE).

Table 10.1 Estimated Coefficients for Socio-Demographic Characteristics

Variables

Proposed

Ver_LK

Ver_LKOE

Ver_OE

PAV

SAV

PAV

SAV

PAV

SAV

PAV

SAV

Constant

0.183

-0.237

-0.829

-0.488

0.456

-0.522

0.495

-0.175

P a g e | XXII

Male

0.035

0.042

0.013

-0.293

0.037

0.031

0.084

0.023

Age

16 and 25

0.566

0.327

0.23

-0.326

0.441

0.51

0.556

0.304

26 and 35

0.463

0.179

0.547

-0.189

0.279

0.572

0.499

0.039

36 and 45

0.343

0.059

0.223

-0.215

0.379

0.543

0.313

-0.111

46 and 55

0.292

0.198

0.242

-0.114

0.559

0.983

0.201

-0.03

Household

Income

0 - $24,999

-0.047

0.228

-0.045

-0.102

-0.041

0.362

-0.065

0.524

$25,000 and $49,999

-0.013

0.098

-0.129

-0.295

0.261

0.699

-0.152

-0.161

$50,000 and $74,999

0.084

0.147

-0.032

-0.462

0.477

0.836

-0.158

-0.053

$75,000 and $99,999

-0.082

0.223

0.093

0.131

0.468

0.944

-0.69

-0.332

Less than high school graduate

-0.806

-0.865

-0.558

-0.545

0.325

-0.105

-1.126

-0.992

High school graduate or GED

-0.359

-0.513

-0.234

-0.384

0.115

0.091

-0.351

-0.721

Some college or associate degree

-0.298

-0.48

0.276

-0.436

-0.346

-0.508

-0.369

-0.243

Bachelors' degree

-0.151

-0.033

0.145

0.028

0.093

0.319

-0.365

-0.104

White

-0.129

0.01

0.273

0.169

-0.325

-0.275

-0.286

0.076

Black or African American

-0.096

-0.267

0.41

0.031

-0.472

-0.474

-0.242

-0.327

American Indian or Alaska Native

0.264

-0.801

0.181

-1.159

-0.148

-0.232

0.398

-0.906

Asian

-0.154

-0.026

-0.329

0.297

-0.339

-1.135

-0.091

0.033

Native Hawaiian or other Pacific Islander

-0.293

-0.755

0.195

0.385

0.004

-0.523

-0.6

-1.306

Full-time

-0.285

-0.215

-0.294

0.165

-0.181

-0.46

-0.088

Part-time

-0.358

-0.292

-0.043

0.199

-0.352

-0.21

-0.457

-0.677

Number of adults

-0.122

-0.024

0.049

0.12

-0.198

-0.148

-0.17

0.014

Number of children aged between 8 and 17

0.022

0.092

0.055

0.099

-0.153

-0.097

0.101

0.178

Number of children aged less than 8

-0.11

-0.184

-0.165

-0.254

0.079

-0.005

-0.158

-0.28

Coefficients at 99% confidence level

Coefficients at 95% confidence level

Coefficients at 90% confidence level

Travel characteristics- The estimated coefficients are presented in Table 10.2. Those with

higher vehicle miles are more likely to choose AVs [Prop, Ver_LKOE, Ver_OE] (similar

results were observed by Shabanpour et al. [68], but the opposite effect for Ver_LKOE).

Besides, the current mode used for commute also influences the choice of mode for commute

trips. Those walking/biking are more likely to choose shared AVs [Prop, Ver_LK,

Ver_LKOE, Ver_OE], while those using motorbikes/mopeds are more likely to choose

private AVs [Prop, Ver_LKOE, Ver_OE] (opposite effect for Private AV for Ver_LK and

Shared AV for Ver_LKOE). Respondents using a car to commute are less likely to use AVs

and are more likely to stick to conventional cars- which probably is more habitual [Prop,

Ver_LK, Ver_LKOE] (opposite effect for Ver_OE). Interestingly, those using car-sharing

options are more likely-to-use Private AVs [Prop, Ver_LK, Ver_OE] (opposite effect for

Private AV for Ver_LKOE), while those using taxi/ride-hailing services are more likely-to-

use AVs (both shared and private) [Prop, Ver_LK, Ver_OE], while we observed opposite

effects for Ver_LKOE. Individuals not owning a car are less likely to use Private AVs and

Shared AVs [Prop, Ver_LKOE] (opposite effects for Shared AV for Ver_LK and Ver_OE).

Also, respondents travelling alone are less likely to use Shared AVs [Prop, Ver_LKOE]

P a g e | XXIII

(opposite effect for Private AV for Ver_LK), and those travelling with others are likely to use

AVs in general [Prop, Ver_LK, Ver_LKOE, Ver_OE]. Interestingly, even people who

consider leaving items in the car to be important are willing to use Shared AVs [Prop,

Ver_LK, Ver_LKOE, Ver_OE]. While exploring the influence of the familiarity with AV

systems, individuals familiar with AVs are optimistic about using AVs (opposite effects for

Shared AV for Ver_LKOE and Private AV for Ver_OE); however, individuals who have

ridden AVs in the past prefer Private AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE] (Zmud and

Sener [208] observed similar results).

Table 10.2 Estimated Coefficients for Travel, Familiarity with AVs and SP

Variables

Proposed

Ver_LK

Ver_LKOE

Ver_OE

PAV

SAV

PAV

SAV

PAV

SAV

PAV

SAV

Total miles travelled is less than 35 miles

0.016

-0.2

0.183

-0.405

-0.038

-0.359

-0.034

0.064

Total miles travelled is between 35 and 70

miles

-0.102

-0.22

0.014

-0.268

-0.058

-0.143

-0.201

-0.167

Mode used for commute

trip

Walk

-0.04

0.161

0.02

0.149

0.001

0.544

-0.023

0.015

Bike

0.036

0.697

-0.334

0.775

-0.024

0.191

0.09

0.876

Motorcycle/moped

0.101

-0.077

-0.794

-1.458

0.133

0.015

0.223

-0.698

Car/SUV/Van/Pickup

-0.271

-0.117

-0.264

-0.602

-0.416

-0.288

0.147

0.129

Car-sharing

0.181

-0.318

0.217

-0.036

-0.331

-0.778

0.494

-0.045

Taxi/ride-hailing

0.145

0.185

0.391

0.07

-0.296

-0.035

0.106

0.447

Public transport

0.189

0.27

0.053

0.454

-0.251

-0.088

0.37

0.47

Does not own a car

-0.275

0.039

-0.697

-0.283

-0.159

0.453

-0.114

-0.449

Travels alone in the car

0.132

-0.214

-0.417

-0.572

0.069

-0.063

0.238

-0.112

Travels with others in the car

0.396

0.24

0.003

0.089

0.117

0.44

0.311

0.019

Leaving items in the car is important

0.044

0.274

0.093

0.229

0.419

0.486

-0.041

0.262

Familiarity with AV

0.021

0.218

0.087

0.602

0.057

-0.134

-0.144

0.239

Has ridden AV

0.347

-0.081

0.216

-0.702

0.642

0.412

0.204

-0.059

Purchase Cost (Regular Car)

0.848

0.17

0.751

-0.087

1.022

0.374

0.611

0.023

Purchase Cost (Private AV)

-1.126

-0.34

-0.93

-0.132

-1.264

-0.37

-0.955

-0.259

Membership Cost (Shared AV)

0.067

-0.26

0.08

-0.218

0.096

-0.293

0.072

-0.265

Travel Cost (Regular Car)

0.372

0.254

0.445

0.411

0.326

0.334

0.333

-0.067

Travel Cost (Private AV)

-0.361

-0.106

-0.206

-0.18

-0.428

-0.12

-0.403

0.06

Travel Cost (Shared AV)

0.012

-0.447

-0.035

-0.385

0.071

-0.461

-0.43

Parking Charge (Regular Car)

0.205

0.046

0.227

-0.12

0.189

-0.025

0.203

0.097

Parking Charge (Private AV)

-0.227

-0.066

-0.165

-0.021

-0.255

-0.062

-0.234

-0.016

Attitudinal_1

0.045

-1.065

-0.102

1.32

0.402

-0.871

0.785

-0.385

Attitudinal_2

2.124

1.306

-2.103

-1.1

-2.382

-2.002

0.827

1.11

Coefficients at 99% confidence level

Coefficients at 95% confidence level

Coefficients at 90% confidence level

The second experimental design consists of stated-preference choice scenarios developed

using the values of the average cost of a car, travel time for commute and parking costs

reported by respondents. As discussed previously, we adopted the stated preference from

P a g e | XXIV

Haboucha, Ishaq and Shiftan [40], and the estimated coefficients for the variables related to

the experiment are consistent with their findings. However, unlike their approach, we

compared the absolute values of the purchase cost for “Regular Car” and PAV. As was

observed by Haboucha, Ishaq and Shiftan [40], an increase in purchase cost for regular cars

[Prop, Ver_LKOE, Ver_OE] and private AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE]

negatively affects the preference for the corresponding types and a similar trend is observed

for the subscription cost for Shared AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE]. An increase

in the travel cost for regular cars increases the preference for AVs (both private and shared)

[Prop, Ver_LK, Ver_LKOE] that for Private AVs causes a decrease in the likelihood of choice

for both versions of AVs [Prop, Ver_LK, Ver_LKOE] and an increase in the travel cost for

Shared AVs decreases the preference for AVs [Prop, Ver_LK, Ver_LKOE, Ver_OE].

Similarly, an increase in the parking charges for regular cars has a positive correlation with

the use of AVs [Prop, Ver_OE] and an increase in parking charges for AVs has the opposite

effect [Prop, Ver_LK, Ver_LKOE, Ver_OE]. The direction of the estimated coefficients for

travel time and parking charges also aligns with the findings of Haboucha, Ishaq and Shiftan

[40].

Attitudinal characteristics- in this research, attitudes were estimated as a latent variable,

which was later included as an explanatory variable in the utility equation for mode choice.

Hence, while assessing the effects of attitudes in Table 10.3 and Table 10.4, we consider the

signs and magnitudes of variables “Attitudinal_1” and “Attitudinal_2” from Table 10.2.

The notion that it is easy to learn to use AVs is positively associated with the use of AVs

(private and shared), so is the perception that it is easy to use AVs. Positive emotions towards

“it will be easy to get AVs to do what I want them to do” and “easy to become skilful at using

AVs” is, however, positively associated mainly with the use of Shared AVs. When it comes

to “Perceived Usefulness”, we find a positive correlation between the positive attitudes and

the use of Private AVs for all the questions. However, when it comes to using Shared AVs,

only the usefulness in meeting travel needs, performing other tasks during travel and the

usefulness when impaired were found to influence. The coefficients for the various Likert

scale responses are presented below in Table 10.3.

Table 10.3 Estimated Coefficients for Likert Scale Responses

Ver_LK

Ver_LKOE

Prop

Ind

Prop

Ind

Coe_1

Coe_2

Coe_1

Coe_2

Coe_1

Coe_2

Coe_1

Coe_2

P a g e | XXV

Constant_1

0.423

-0.389

0.045

0.542

0.682

-0.771

0.212

-0.232

Learning to use Autonomous Vehicles will be easy for me (PEoU_1)

PEoU_11

0.259

-0.026

-0.31

0.499

-0.022

0.218

0.126

0.498

PEoU_12

0.483

0.115

-0.306

0.052

0.191

0.134

0.189

0.068

PEoU_13

-0.027

0.218

0.118

-0.053

0.044

0.087

0.026

0.05

PEoU_14

0.105

0.311

0.073

-0.165

-0.037

0.312

-0.031

-0.127

PEoU_15

0.038

0.245

0.104

-0.05

-0.281

0.39

-0.115

-0.35

I will find it easy to get Autonomous Vehicles to do what I want them to do (PEoU_2)

PEoU_21

0.072

-0.013

-0.497

0.059

-0.118

0.05

0.046

0.25

PEoU_22

-0.192

-0.276

0.148

0.049

-0.434

-0.212

-0.372

0.264

PEoU_23

-0.087

-0.286

0.146

0.11

-0.073

-0.151

0.004

0.293

PEoU_24

-0.319

-0.034

0.382

-0.167

-0.294

-0.021

-0.135

0.078

PEoU_25

-0.144

-0.668

0.246

0.502

0.233

-0.148

0.259

0.35

It will be easy for me to become skilful at using Autonomous Vehicles (PEoU_3)

PEoU_31

-0.51

-0.365

0.171

0.212

-0.291

0.252

-0.42

-0.024

PEoU_32

-0.798

-0.551

0.614

0.015

0.287

-0.389

0.096

0.179

PEoU_33

-0.038

-0.413

-0.14

-0.162

0.39

-0.282

0.166

0.034

PEoU_34

-0.059

-0.414

-0.139

-0.122

0.378

-0.341

0.169

0.128

PEoU_35

0.205

-0.128

-0.443

-0.432

-0.114

-0.371

-0.135

0.003

I will find Autonomous Vehicles easy for me (PEoU_4)

PEoU_41

-0.223

0.544

0.67

0.176

0.013

-0.084

-0.141

0.377

PEoU_42

0.789

0.044

-0.531

0.098

-0.164

-0.06

-0.001

-0.082

PEoU_43

0.24

-0.109

-0.079

0.228

-0.268

-0.083

-0.186

-0.041

PEoU_44

0.089

0.115

0.079

0.017

0.037

-0.243

0.027

0.184

PEoU_45

0.511

0.413

-0.27

-0.418

0.95

-0.02

0.772

0.296

Using Autonomous Vehicles will be useful in meeting my travel needs (PU_1)

PU_11

-0.023

-0.54

0.111

1.031

0.182

-0.248

0.072

1.298

PU_12

0.433

-0.133

-0.304

0.457

0.361

-0.099

0.311

1.027

PU_13

0.182

0.295

-0.075

-0.114

-0.294

0.67

-0.178

-0.085

PU_14

0.311

0.757

-0.161

-0.62

-0.384

1.27

-0.063

-0.66

PU_15

0.296

0.804

-0.161

-0.65

-0.713

1.227

-0.325

-0.701

Autonomous Vehicles will let us do other tasks such as eating, watching a movie, be on a cell phone during my trip

(PU_2)

PU_21

-0.4

0.085

0.307

-0.188

0.052

0.265

0.326

0.009

PU_22

-0.23

-0.307

0.096

0.168

-0.374

0.249

-0.231

-0.314

PU_23

0.217

-0.214

-0.405

0.009

-0.186

0.136

0.021

-0.059

PU_24

0.078

-0.158

-0.231

-0.014

-0.246

0.148

-0.04

-0.129

PU_25

-0.042

-0.293

-0.035

0.084

-0.338

0.191

-0.146

-0.131

Using Autonomous Vehicles will decrease my accident risk (PU_3)

PU_31

0.432

0.142

-0.717

0.274

-0.047

-0.174

-0.098

0.685

PU_32

-0.196

0.345

0.033

0.032

-0.203

0.085

-0.28

0.486

PU_33

0.079

0.479

-0.197

-0.132

0.059

0.363

0.052

0.248

PU_34

-0.313

0.623

0.265

-0.302

-0.326

0.891

-0.174

-0.451

PU_35

-0.287

0.751

0.233

-0.386

-0.084

0.755

0.153

-0.216

Using Autonomous Vehicles will relieve my stress of driving (PU_4)

PU_41

0.255

-0.213

-0.347

0.29

0.163

-0.311

-0.041

0.563

PU_42

-0.232

-0.363

0.007

0.528

0.103

-0.166

-0.038

0.459

PU_43

-0.216

0.063

0.081

0.059

-0.086

0.155

0.048

0.037

P a g e | XXVI

PU_44

-0.515

0.386

0.337

-0.254

-0.08

0.007

-0.022

0.177

PU_45

-0.706

0.266

0.658

-0.235

-0.136

0.429

0.103

-0.447

I find Autonomous Vehicles to be useful when I’m impaired (PU_5)

PU_51

-0.015

-0.056

-0.185

-0.022

-0.085

-0.124

0.094

0.388

PU_52

0.173

-0.081

-0.451

-0.099

0.106

-0.047

0.239

0.021

PU_53

-0.112

-0.212

-0.015

0.104

-0.091

0.089

0.152

-0.119

PU_54

-0.321

0.168

0.118

-0.358

-0.209

0.155

0.031

-0.302

PU_55

-0.673

0.352

0.483

-0.512

-0.6

0.24

-0.318

-0.455

Coefficients at 99% confidence level

Coefficients at 95% confidence level

Coefficients at 90% confidence level

Attitudes were measured using open-ended responses collected from the respondents

answering Ver_OE of the questionnaire (refer to Table 10.4 for the estimated coefficients).

As discussed previously in Section 4.5.2.2, we extracted four topics for the open-ended

question related to the “Perceived Ease of Use of AVs” and seven topics for the “Perceived

Usefulness of AVs.” As expected, the easiness of use of AV (To_L11) is likely to influence

the choice of Private AVs. Topics that discussed the easiness in operation (To_L13),

navigation (To_L14) and the need for human presence (To_L12) are related more to the use

of Shared AVs. The ability to multi-task (work) (To_L22) is associated with the use of Private

AVs; however, the need for additional attention, which may cause distractions to work

(To_L26), influences the choice of Shared AVs negatively but has mixed reactions for Private

AVs (attitudes has 2 dimensions). The perception that it might save time and make travel

more environmentally friendly (To_L21) and it makes travel safer and mitigate congestion

(To_L23) is positively related to the use of both types of AVs. AVs ensuring mobility for the

disabled (To_L24) is linked positively with the use of Shared AVs. Other aspects of AVs'

usefulness discussed by the respondents include making parking easier (To_L27) and the need

for human control (To_L25), and individuals discussing these are more likely to prefer Shared

AVs.

Table 10.4 Estimated Coefficients for the Topics

Ind

Prop

Coeff_1

Coeff_2

Coeff_1

Coeff_2

Constant_1

0.005

0.145

0.048

-0.052

To_L11

0.038

0.055

0.086

0.063

To_L12

-0.071

-0.083

-0.198

-0.087

To_L13

-0.02

0.127

0.12

0.28

To_L14

-0.018

-0.059

-0.049

-0.06

To_L21

-0.07

0.047

-0.019

0.148

To_L22

0.052

0.263

0.351

0.318

To_L23

-0.075

0.155

0.081

0.321

To_L24

-0.005

-0.046

-0.059

P a g e | XXVII

To_L25

-0.133

0.027

-0.103

0.15

To_L26

0.036

-0.18

-0.155

-0.342

To_L27

-0.039

0.028

-0.002

0.059

Coefficients at 99% confidence level

Coefficients at 95% confidence level

Coefficients at 90% confidence level

P a g e | XXVIII

P a g e | XXIX

APPENDIX F

11 RESULTS FOR THE PROPOSED FRAMEWORK

We present the results of the mapping (average of the generated values) between the Likert

scale responses and open-ended responses in Table 11.1. Referring to rows 4-14 and rows 17-

27, one can obtain the results for Ver_LK and Ver_LKOE of the questionnaire, respectively.

We use abbreviations for the various levels of the Likert scale responses (SD- Strongly

Disagree, Disag- Disagree, Neut- Neutral, Agree- Agree and SA- Strongly Agree). Table 11.2

presents the mapping between the observed topic proportions for open-ended questions and the

generated averages for Likert scale responses for Ver_LK and Ver_LKOE.

For the Likert scale responses, one could directly map the observed and the generated averages

for Ver_LK and Ver_LKOE of the questionnaire. However, for the topic proportions of the

open-ended questions, readers should not expect a direct correspondence between Likert scale

questions and topics in the same row (for instance, there is no direct correspondence between

AV_LeaEa and L11 or between AV_WrkEa and L12).

P a g e | XXX

Table 11.1 Mapping of the Likert Scale Responses for Ver_LK and Ver_LKOE

Disag

Neut

Agree

Disag

Neut

Agree

Top

Prop

(%)

Word_1

Word_2

Word_3

Word_4

Word_5

Attitudes estimated for Ver_LK using the Proposed Model

Ver_LK (Observed Averages)

Ver_LKOE (Generated Averages)

Ver_OE (Generated Topic Proportions)

AV_LeaEa

5.79

10.04

35.68

35.67

12.82

10.63

23.51

7.71

15.00

26.13

L11

13.95

use

easi

technolog

work

get

AV_WrkEa

5.27

13.07

43.59

30.30

7.77

10.94

16.62

16.90

10.85

27.66

L12

27.47

drive

road

human

mani

accid

AV_SklEa

4.60

10.07

32.08

41.59

11.66

9.28

14.13

12.91

12.16

34.50

L13

49.98

drive

control

driver

make

easier

AV_UseEa

4.40

10.32

36.27

39.22

9.79

12.49

8.51

11.68

36.37

13.92

L14

9.91

oper

everyth

assum

user

AV_TrNed

7.00

12.04

33.63

34.23

13.09

6.69

29.73

6.38

22.89

17.28

L21

12.44

time

better

environ

make

save

AV_OthAc

11.57

15.57

26.08

34.87

11.91

42.92

13.30

7.44

8.14

11.18

L22

21.35

drive

driver

thing

work

make

AV_DeAcc

12.42

22.34

37.29

21.56

06.39

9.93

21.87

16.37

21.41

13.39

L23

17.63

accid

human

traffic

reduc

help

AV_ReStr

12.74

22.66

27.70

27.48

9.42

13.69

12.13

8.30

27.55

21.31

L24

3.53

drive

use

get

help

disabl

AV_UsImp

11.34

12.87

26.56

35.82

13.41

37.79

2.51

12.51

17.15

13.02

L25

22.78

use

drive

need

technolog

situat

L26

15.38

take

attent

pay

use

L27

6.89

driver

help

transport

safeti

safer

Attitudes estimated for Ver_LKOE using the Proposed Model

Ver_LKOE (Observed Averages)

Ver_LK (Generated Averages)

Ver_OE (Generated Topic Proportions)

AV_LeaEa

6.16

10.01

29.63

37.16

17.03

12.69

16.38

16.10

18.80

16.04

L11

12.34

use

easi

technolog

work

get

AV_WrkEa

6.44

12.91

40.49

31.52

8.64

8.30

10.17

7.44

32.63

21.46

L12

24.90

drive

road

human

mani

accid

AV_SklEa

5.56

8.24

27.23

42.34

16.62

7.40

12.14

22.05

21.06

17.34

L13

52.40

drive

control

driver

make

easier

AV_UseEa

5.74

10.58

29.83

39.70

14.15

31.25

16.03

9.56

5.17

17.99

L14

10.36

oper

everyth

assum

user

AV_TrNed

10.20

11.77

25.67

33.87

18.49

16.78

36.30

5.99

9.92

11.00

L21

11.89

time

better

environ

make

save

AV_OthAc

11.12

13.64

21.60

33.72

19.91

19.46

17.57

16.94

10.18

15.85

L22

21.50

drive

driver

thing

work

make

AV_DeAcc

14.47

17.13

29.96

25.54

12.90

24.32

8.91

14.99

15.07

16.71

L23

18.40

accid

human

traffic

reduc

help

AV_ReStr

14.53

18.34

22.15

31.02

13.95

10.32

32.76

4.55

18.10

14.27

L24

3.63

drive

use

get

help

disabl

AV_UsImp

12.13

12.58

19.50

32.96

22. 83

9.84

3.84

41.61

7.99

16.72

L25

21.12

use

drive

need

technolog

situat

L26

16.84

take

attent

pay

use

L27

6.62

driver

help

transport

safeti

safer

P a g e | XXXI

Table 11.2 Mapping of the Likert Scale Responses for the Extracted Topics from Open-ended Responses

p #

Top

Prop

(%)

Word_1

Word_2

Word_3

Word_4

Word_5

Disag

Neut

Agree

Disag

Neut

Agree

Attitudes estimated for Ver_OE using the Proposed Model

Ver_OE (Observed Topic Proportions)

Ver_LK (Generated Averages)

Ver_LKOE (Generated Averages)

L11

9.68

use

easi

technolog

work

get

AV_LeaEa

13.28

18.58

16.01

19.05

16.05

10.27

23.93

7.74

14.52

26.52

L12

41.98

drive

road

human

mani

accid

AV_WrkEa

8.69

12.00

7.27

34.52

20.49

10.94

16.66

16.58

11.37

27.42

L13

37.60

drive

control

driver

make

easier

AV_SklEa

9.15

14.84

21.40

20.44

17.15

9.20

13.61

13.22

11.96

35.00

L14

10.74

oper

everyth

assum

user

AV_UseEa

30.69

17.76

9.52

5.39

19.62

12.44

8.51

11.70

36.17

14.17

L21

3.81

time

better

environ

make

save

AV_TrNed

16.49

37.79

6.79

10.63

11.28

6.85

29.86

6.20

22.07

18.01

L22

15.70

drive

driver

thing

work

make

AV_OthAc

21.36

19.38

16.80

9.80

15.65

40.71

13.89

7.60

8.72

12.06

L23

21.59

accid

human

traffic

reduc

help

AV_DeAcc

26.95

8.85

16.28

14.74

16.15

9.94

22.12

16.34

21.79

12.79

L24

8.57

drive

use

get

help

disabl

AV_ReStr

9.84

34.06

5.36

17.39

16.32

13.60

12.28

8.33

28.00

20.78

L25

12.50

use

drive

need

technolog

situat

AV_UsImp

10.10

4.35

42.96

8.26

17.31

36.60

2.75

12.13

16.79

14.70

L26

9.31

take

attent

pay

use

L27

28.52

driver

help

transport

safeti

safer

P a g e | XXXII

P a g e | XXXIII

APPENDIX G

12 PYTHON CODE FOR TOPIC MODELS

Topic Model Analysis

The code is written to extract topics from text data. The code supports performing the following analysis:-

1. Latent Dirichlet Allocation (LDA)

2. Supervised Latent Dirichlet Allocation (sLDA)

Latent Dirichlet allocation can be performed using Gensim or Tomotopy. Supervised LDA can be performed

using Tomotopy. The dependent variable can be linear or binary. Visualisations to evaluate the results from the

Topic Models, pyLDAvis to understand topics and the inter-topic distance.

Importing the Libraries

### ************************** Importing Packages ************************ ###

from __future__ import division

import re # regular expressions

import numpy as np # scientific computing

import pandas as pd # datastructures and computing

import pprint as pprint # better printing

import os

import os.path

# Gensim

import gensim

import gensim.corpora as corpora

from gensim.utils import simple_preprocess

from gensim.models import CoherenceModel

# Lemmatization

from nltk.stem import PorterStemmer

from nltk.tokenize import sent_tokenize, word_tokenize

# Plotting tools

import pyLDAvis # interactive Topic Model visualisation

import pyLDAvis.gensim

import matplotlib.pyplot as plt

# Libraries for Topic Models

import sys

import tomotopy as tp

# fix random generator seed (for reproducibility of results)

np.random.seed(42)

import warnings

warnings.filterwarnings("ignore", category=DeprecationWarning)

Creating the list of Stop Words

# NLTK Stop words

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

stop_words.extend(['also', 'back', 'cant', 'come', 'could', 'done', 'dont', 'due', 'els', 'etc', 'hope', 'howev','know',

'let', 'like', 'may', 'mayb', 'much', 'must', 'new', 'non', 'one', 'other', 'plu', 'pretti', 'said',

'say', 'see', 'sinc', 'someon', 'someoth', 'therefor', 'today', 'want', 'well', 'would', 'ye', 'car',

'cars', 'think', 'autonomous', 'vehicle', 'vehicles','people', 'seem', 'seems', 'really', 'still',

'however', 'believe', 'right', 'truly', 'automatic', 'sound', 'sounds', 'general', 'become', 'total',

'totally', 'tell', 'something', 'anything', 'person', 'phone'])

Importing the Dataset

P a g e | XXXIV

The dataset preparation is quite important here. If the model used is simple LDA, only the text data is

mandatory. However, for the supervised LDA, it requires the response variable (dependent variable) along with

the text data.

### ************************** Importing Datasets ************************ ###

directo = "<folder_path>"

df = pd.read_excel(directo + "\\<filename>")

df.head()

|=======================================================================

# convert the content field in dataset into a list

data = df.AVO_EUsT.values.tolist()

resp = df.Resp.values.tolist()

data[:5]

resp[:5]

Cleaning the Dataset

In this section of the code, the following data cleaning techniques are used:-

1. E-mail id and Newline characters

2. Remove ''StopWords'' from the dataset

3. Forming bigrams and trigrams

4. Stemming

### ************************** Datasets Cleaning ************************* ###

E-mail id and New line characters

# Remove Emails

data = [re.sub('\S*@\S*\s?', '', sent) for sent in data]

# Remove new line characters

data = [re.sub('\n', ' ', sent) for sent in data]

pprint.pprint(data[:5])

Replacing Phrases with Meaningful Words

data_1 = []

p1 = re.compile("(do\s*not\s*trust|don't\s*trust|don't\s*fully\s*trust|would\s*\not*\s*trust|never\s*trust)")

don't\s*feel\s*it's\s*safe")

p3 = re.compile("not\*feel")

p4 = re.compile("don't\*think")

for item in data:

data_1a = p1.sub("no_trust", item)

data_1b = p2.sub("unsafe", data_1a)

data_1c = p3.sub("not_feel", data_1b)

data_1d = p2.sub("dont_think", data_1c)

data_1.append(data_1d)

data = data_1

print(data[:5])

Removing StopWords

Function to remove StopWords

def remove_stopwords(texts):

"""

objective:

function to remove stopwords from the paragraph/sentence

uses the preprocess

input:

paragraph/sentences

output:

wordlist after the stopwords are removed

"""

P a g e | XXXV

return [[word for word in simple_preprocess(str(doc)) if word not in stop_words]

for doc in texts]

data_words_nostops = remove_stopwords(data)

data_words_nostops[:5]

Forming Bigrams and Trigrams

def make_bigrams(texts):

"""

objective:

takes the processed text- after preprocessing and stop word removal

input:

preprocessed text

output:

text with bigrams

"""

return [bigram_mod[text] for text in texts]

def make_trigrams(texts):

"""

objective:

generate trigrams for the text

input:

text with bigrams

output:

text with trigrams

"""

return [trigram_mod[bigram_mod[text]] for text in texts]

# Build functions to remove stopwords, bigram and trigram models- calibration dataset

bigram = gensim.models.phrases.Phrases(data, min_count=5, threshold=100)

trigram = gensim.models.phrases.Phrases(bigram[data], threshold=100)

# Passing the parameters to the bigram/trigram- calibration dataset

bigram_mod = gensim.models.phrases.Phraser(bigram)

trigram_mod = gensim.models.phrases.Phraser(trigram)

data_words_bigrams = make_bigrams(data_words_nostops)

data_words_bigrams[:5]

Lemmatization

ps = PorterStemmer()

data_lemmatized = []

for texts in data_words_bigrams:

data_lemmatized.append([ps.stem(doc) for doc in texts])

data_lemmatized[:5]

Writing the Files to the dataset

df['Cleaned_Data'] = data_lemmatized

df.head()

df.to_csv(directo + "\\Output_Q1_Words.csv")

LDA Model

# Defining the LDA Function

def lda_model(input_list, save_path):

"""

desc:

the function estimates the LDA model and outputs the estimated topics

input:

list with documents as responses

output:

prints the topics

words and their corresponding proportions

"""

mdl = tp.LDAModel(tw=tp.TermWeight.ONE, # Term weighting

min_cf=3, # Minimum frequency of words

P a g e | XXXVI

rm_top=0, # Number of top frequency words to be removed

k=4, # Number of topics

seed=42)

for n, line in enumerate(input_list):

ch = " ".join(line)

docu = ch.strip().split()

mdl.add_doc(docu)

mdl.burn_in = 10000

mdl.train(10000)

print('Num docs: ', len(mdl.docs), 'Vocab size: ', mdl.num_vocabs, 'Num words: ', mdl.num_words)

print('Removed words: ', mdl.removed_top_words)

print('Training...', file=sys.stderr, flush=True)

for i in range(0, 50000, 10):

mdl.train(100)

print('Iteration: {}\tLog-likelihood: {}'.format(i, mdl.ll_per_word))

print('Saving...', file=sys.stderr, flush=True)

mdl.save(save_path, True)

for k in range(mdl.k):

print('Topic #{}'.format(k))

for word, prob in mdl.get_topic_words(k):

print('\t', word, prob, sep='\t')

return mdl

Estimating the Topic Model

print('Running LDA')

lda_model = lda_model(data_lemmatized, 'test.lda_4_T.bin')

Supervised LDA

def slda_model(documents, dep_var, save_path):

"""

desc:

the function estimates the sLDA model and outputs the estimated topics

input:

list with documents as responses

dependent variable

output:

prints the topics

words and their corresponding proportions

"""

smdl = tp.SLDAModel(tw=tp.TermWeight.ONE, # Term weighting

min_cf=3, # Minimum frequency of words

rm_top=0, # Number of top frequency words to be removed

k=4, # Number of topics

vars=['b'], # Number of dependent variables

seed=42)

for row, pred in zip(documents, dep_var):

pred_1 = []

pred_1.append(pred)

ch = " ".join(row)

docu = ch.strip().split()

smdl.add_doc(words=docu, y=pred_1)

smdl.burn_in = 10000

smdl.train(10000)

# Printing the output statistics

print('Num docs: ', len(smdl.docs), 'Vocab size: ', smdl.num_vocabs, 'Num words: ', smdl.num_words)

print('Removed top words: ', smdl.removed_top_words)

print('Training...', file=sys.stderr, flush=True)

for i in range(0, 50000, 10):

smdl.train(100)

print('Iteration: {}\tLog-likelihood: {}'.format(i, smdl.ll_per_word))

P a g e | XXXVII

print('Saving...', file=sys.stderr, flush=True)

smdl.save(save_path, True)

for k in range(smdl.k):

print('Topic #{}'.format(k))

for word, prob in smdl.get_topic_words(k):

print('\t', word, prob, sep='\t')

return smdl

print('Running Supervised LDA')

slda_model = slda_model(data_lemmatized, resp, 'test.slda_4_T.bin')

Visualising the Results of LDA

pyLDAvis does not have a module that allows Topic Models estimated using Tomotopy to be used directly for

plotting the graphs. It however allows plotting after the following parameters are computed for each of the Topic

Models:-

1. Phi a. probabilities of each word(W) for a given topic(K) under consideration

b. is a K x W vector

2. theta

a. probability mass function over ``K'' topics for all the documents in the corpus (D)

b. is a D x K matrix

3. n(d) a. number of tokens for each document

4. vocab

a. vector of terms in the vocabulary

b. presented in the same order as in ``phi''

5. M(w)

a. frequency of term ``w'' across the entire corpus

Computing the value of ``Phi'' for the Model

def compute_phi(model):

"""

desc:

this function computes the value of phi for visualising the results of Topic Model

probabilities of each word for a given topic

input:

the Topic Model

output:

K x W vector

K = number of topics

W = number of words

"""

mat_phi1 = []

for i in range(model.k):

mat_phi1.append(model.get_topic_words(i,model.num_vocabs))

list_words = []

for text in mat_phi1[0]:

list_words.append(text[0])

list_words.sort()

mat_phi2 = [[i * j for j in range(model.num_vocabs)] for i in range(model.k+1)]

for i in range(model.num_vocabs):

mat_phi2[0][i] = list_words[i]

j1 = []

k1 = []

m = 0

while m < model.k:

j1.append(m)

m += 1

n = 1

while n <= model.k:

P a g e | XXXVIII

k1.append(n)

n += 1

for j, k in zip(j1, k1):

for index, word in enumerate(mat_phi2[0]):

#print(word)

for item in mat_phi1[j]:

#print(item)

if word == item[0]:

mat_phi2[k][index] = item[1]

if os.path.isfile(directo + '\\topic_word_prob_lda_4_T.csv'):

with open(directo + '\\topic_word_prob_slda_4_T.csv', 'w') as f:

for item in mat_phi2:

for items in item:

f.writelines("%s, " % items)

f.writelines("\n")

f.close()

else:

with open(directo + '\\topic_word_prob_lda_4_T.csv', 'w') as f:

for item in mat_phi2:

for items in item:

f.writelines("%s, " % items)

f.writelines("\n")

f.close()

return mat_phi2[0], mat_phi2[1:]

Computing the value of ``Theta'' for the Model

For LDA Model

def compute_theta_lda(model, data):

"""

desc:

this function computes the value of theta for visualising the results of Topic Model

probabilities mass function over "K" topics for all documents (D) in the corpus

input:

the Topic Model

dataset

output:

D x K vector

D = number of documents

K = number of topics

"""

mat_theta = []

for n, line in enumerate(data):

ch = " ".join(line)

docu = ch.strip().split()

theta_val = model.infer(doc=model.make_doc(docu),

iter=100,

workers=0,

together=False)

mat_theta.append(theta_val[0])

with open(directo + '\\topic_probabilities_lda_4_T.csv', 'w') as f:

for item in mat_theta:

for items in item:

f.writelines("%s, " %items)

f.writelines("\n")

f.close()

return mat_theta

For sLDA Model

def compute_theta_slda(model, data, dep_var):

"""

desc:

P a g e | XXXIX

this function computes the value of theta for visualising the results of Topic Model

probabilities mass function over "K" topics for all documents (D) in the corpus

input:

the Topic Model

dataset

dependent variable

output:

D x K vector

D = number of documents

K = number of topics

"""

mat_theta = []

for line, dep in zip(data, dep_var):

pred_1 = []

pred_1.append(dep)

ch = " ".join(line)

docu = ch.strip().split()

theta_val = model.infer(doc=model.make_doc(words=docu, y=pred_1),

iter=100,

workers=0,

together=False)

mat_theta.append(theta_val[0])

with open(directo + '\\topic_probabilities_slda_4_T.csv', 'w') as f:

for item in mat_theta:

for items in item:

f.writelines("%s, " %items)

f.writelines("\n")

f.close()

return mat_theta

Number of Tokens per document

def num_token(data):

"""

desc:

this function computes number of tokens per document for the entire corpus

input:

dataset

output:

N x 1 vector

N = number of tokens in the document

"""

numb_tok = []

for text in data:

numb_tok.append(len(text))

return numb_tok

Frequency of Words in the Corpus

def freq_words(vocabs, data):

"""

desc:

this function computes the frequency of words in the entire corpus

input:

list of words

dataset

output:

N x 1 vector

N = frequency of words in the document

"""

fre_words = []

for words in vocabs:

words_freq = 0

P a g e | XL

for line in data:

for ind_words in line:

if words == ind_words:

words_freq += 1

fre_words.append(words_freq)

return fre_words

Visualising the Results of LDA Model

Computing the Parameters for Visualising LDA Model

# Loading the LDA model

lda_model = tp.LDAModel.load('test.lda_4_T.bin')

#lda_model.get_topic_word_dist(2)

lvocab, lphi_val = compute_phi(lda_model)

ltheta_val = compute_theta_lda(lda_model, data_lemmatized)

lnum_token = num_token(data_lemmatized)

lfreq_terms = freq_words(lvocab, data_lemmatized)

Plotting in pyLDAvis (LDA)

# Visualising the Results

pyLDAvis.enable_notebook()

data_lda = {'topic_term_dists': lphi_val,

'doc_topic_dists' : ltheta_val,

'doc_lengths' : lnum_token,

'vocab' : lvocab,

'term_frequency' : lfreq_terms}

print('Topic-Term shape: %s' % str(np.array(data_lda['topic_term_dists']).shape))

print('Doc-Topic shape: %s' % str(np.array(data_lda['doc_topic_dists']).shape))

vis_lda = pyLDAvis.prepare(**data_lda)

pyLDAvis.display(vis_lda)

Visualising the Results of Supervised LDA Model

Computing the Parameters for Visualising Supervised LDA Model

# Loading the sLDA model

slda_model = tp.SLDAModel.load('test.slda_4_T.bin')

svocab, sphi_val = compute_phi(slda_model)

stheta_val = compute_theta_slda(slda_model, data_lemmatized, resp)

snum_token = num_token(data_lemmatized)

sfreq_terms = freq_words(svocab, data_lemmatized)

Plotting in pyLDAvis (sLDA)

# Visualising the Results

pyLDAvis.enable_notebook()

data_slda = {'topic_term_dists': sphi_val,

'doc_topic_dists' : stheta_val,

'doc_lengths' : snum_token,

'vocab' : svocab,

'term_frequency' : sfreq_terms}

print('Topic-Term shape: %s' % str(np.array(data_slda['topic_term_dists']).shape))

print('Doc-Topic shape: %s' % str(np.array(data_slda['doc_topic_dists']).shape))

vis_slda = pyLDAvis.prepare(**data_slda)

pyLDAvis.display(vis_slda)

Computing Scores for use in Estimation

In this portion of the code, values are computed for each document in the corpus. The values are computed based

on the words used in each of the documents in the corpus. Scores will be computed for each topic. This will be

based on the probability values in each of the topics.

def compute_scores(list_dataset, list_word_prob):

"""

desc:

this function will take the cleaned dataset and list of word probabilities per topic and compute the scores

P a g e | XLI

input:

cleaned dataset as a list

word probabilities as a dataframe

output:

scores for each document in the corpus

"""

n = len(list_dataset)

prob_list = [[0 for i in range(5)] for i in range(n)]

for index, document in enumerate(list_dataset):

# remember to change the number of variables based on the number of topics

probab_1 = 0

probab_2 = 0

probab_3 = 0

probab_4 = 0

for word in document:

for index1, row in list_word_prob.iterrows():

item = row['Word']

prob1 = row['Prob_1']

prob2 = row['Prob_2']

prob3 = row['Prob_3']

prob4 = row['Prob_4']

if word == item:

probab_1 += prob1

probab_2 += prob2

probab_3 += prob3

probab_4 += prob4

prob_list[index][0] = probab_1

prob_list[index][1] = probab_2

prob_list[index][2] = probab_3

prob_list[index][3] = probab_4

prob_list[index][4] = probab_1 + probab_2 + probab_3 + probab_4

return prob_list

Computing the Scores for LDA

lda_dist = pd.read_csv(directo + "\\topic_word_prob_lda_4_T.csv", header=None)

lda_distT = lda_dist.T

lda_distT.columns = ['Word', 'Prob_1', 'Prob_2', 'Prob_3', 'Prob_4']

lda_distT['Word'] = lda_distT['Word'].str.strip()

lda_distT['Prob_1'] = pd.to_numeric(lda_distT.Prob_1, errors='coerce')

lda_distT['Prob_2'] = pd.to_numeric(lda_distT.Prob_2, errors='coerce')

lda_distT['Prob_3'] = pd.to_numeric(lda_distT.Prob_3, errors='coerce')

lda_distT['Prob_4'] = pd.to_numeric(lda_distT.Prob_4, errors='coerce')

probab_lda = compute_scores(data_lemmatized, lda_distT)

df['probab_lda'] = probab_lda

Computing the Scores for sLDA

slda_dist = pd.read_csv(directo + "\\topic_word_prob_slda_4_T.csv", header=None)

slda_distT = slda_dist.T

slda_distT.columns = ['Word', 'Prob_1', 'Prob_2', 'Prob_3', 'Prob_4']

slda_distT['Word'] = slda_distT['Word'].str.strip()

slda_distT['Prob_1'] = pd.to_numeric(slda_distT.Prob_1, errors='coerce')

slda_distT['Prob_2'] = pd.to_numeric(slda_distT.Prob_2, errors='coerce')

slda_distT['Prob_3'] = pd.to_numeric(slda_distT.Prob_3, errors='coerce')

slda_distT['Prob_4'] = pd.to_numeric(slda_distT.Prob_4, errors='coerce')

probab_slda = compute_scores(data_lemmatized, slda_distT)

df['probab_slda'] = probab_slda

df.to_csv(directo + "\\Open_Ended_Q1_Scores_4_Topic.csv")

P a g e | XLII

P a g e | XLIII

APPENDIX H

13 PYTHON CODE FOR THE FRAMEWORK TO MEASURE

ATTITUDES

Contents

1. Problem Description

2. Data Preparation

3. Probabilistic Graphical Model and the Generative Process

4. Proposed Model

Problem Description

We use the data on the mode choice for commute trips by students and workers from the USA. A stated-

preference (SP) survey was used. The questionnaire collected information on:-

1. Socio-demographic characteristics

2. Travel characteristics

3. Familiarity with Autonomous Vehicles

4. Attitudes

5. SP attributes

The attitudes were measured using 5-point Likert scales. For some attitudes, open-ended questions were also

presented to the respondents.

Importing the Libraries

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

import seaborn as sns

import torch

torch.set_default_tensor_type("torch.cuda.FloatTensor")

import pyro

import pyro.distributions as dist

from pyro.contrib.autoguide import AutoDiagonalNormal, AutoMultivariateNormal

from pyro.infer import MCMC, NUTS, HMC, SVI, Trace_ELBO

from pyro.optim import Adam, ClippedAdam

# Cuda GPU resources

torch.cuda.set_device(0)

torch.cuda.requires_grad = True

# fix random generator seed (for reproducibility of results)

np.random.seed(42)

# matplotlib style options

plt.style.use('ggplot')

%matplotlib inline

plt.rcParams['figure.figsize'] = (12, 8)

# Reading files from the local drive

dfv1 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_1_Training_LDA.csv')

dfv2 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_2_Training_LDA.csv')

dfv3 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_3_Training_LDA.csv')

The mode is encoded as a integer from 0 to 2, corresponding to: -

0 - Regular Car

1 - Private Autonomous Vehicle

2 - Shared Autonomous Vehicle

P a g e | XLIV

Frequency Distribution of Mode Choice

def desc_stats(data, title):

print("Dataset size: ", len(data))

data['Choice'].hist();

plt.title(title)

plt.xlabel('Mode ID (0-regular car, 1 - Private Autonomous Vehicle, 2- Shared Autonomous Vehicle)')

plt.ylabel('Frequency')

plt.xticks([0,1,2]);

return

desc_stats(dfv1, "Mode Choice (Version_1)");

dfv1.describe()

desc_stats(dfv2, "Mode Choice (Version_2)");

dfv2.describe()

desc_stats(dfv3, "Mode Choice (Version_3)");

dfv3.describe()

# Concatenate Dataset

df = pd.concat([dfv1, dfv2, dfv3], ignore_index=True)

desc_stats(df, "Mode Choice (Combined)");

df.describe()

Data Preparation

For the Likert scale questions, we created dummy variables and for the open-ended questions, we extracted

variable using Latent Dirichlet allocation Method (Appendix ).

def data_processing(data):

"""

inp:

dataframe to be processed

desc:

create dummy variables for the discrete variables

standardise continuous variables

out:

processed dataframe

"""

# standardize input features

X_mean1 = data.iloc[:, [41, 42]].mean(axis=0)

X_std1 = data.iloc[:, [41, 42]].std(axis=0)

data.iloc[:, [41, 42]] = (data.iloc[:, [41, 42]] - X_mean1) / X_std1

X_mean2 = data.iloc[:, 78: 120].mean(axis=0)

X_std2 = data.iloc[:, 78: 120].std(axis=0)

data.iloc[:, 78: 120] = (data.iloc[:, 78: 120] - X_mean2) / X_std2

# Converting the ordered Attitudinal Responses into Dummy Variables

peu_names = ["AV_LeaEa", "AV_WrkEa", "AV_SklEa", "AV_UseEa"]

pu_names = ["AV_TrNed", "AV_OthAc", "AV_DeAcc", "AV_ReStr", "AV_UsImp"]

psr_names = ["AV_WoSaf", "AV_MaFal"]

ppr_names = ["AV_PeInf", "AV_UsInf", "AV_ShInf"]

tr_names = ["AV_Depe", "AV_Reli", "AV_Trus"]

at_names = ["AV_GIde", "AV_WIde", "AV_Plea"]

X_PEU = np.concatenate([pd.get_dummies(data[x]) for x in peu_names], axis=1).astype("float32")

X_PU = np.concatenate([pd.get_dummies(data[x]) for x in pu_names], axis=1).astype("float32")

X_PSR = np.concatenate([pd.get_dummies(data[x]) for x in psr_names], axis=1).astype("float32")

X_PPR = np.concatenate([pd.get_dummies(data[x]) for x in ppr_names], axis=1).astype("float32")

X_TR = np.concatenate([pd.get_dummies(data[x]) for x in tr_names], axis=1).astype("float32")

X_AT = np.concatenate([pd.get_dummies(data[x]) for x in at_names], axis=1).astype("float32")

# Grouping the Independent Variables into Different Sets

mat = data.values

X_SD = mat[:, [2,3,4,5,6,8,9,10,11,13,14,15,16,18,19,20,21,22,24,25,27,28,29]].astype("float32")

# Set of Socio-demographic Variables

X_TC = mat[:, [30,31,33,34,35,36,37,38,39,46,47,48,49]].astype("float32")

P a g e | XLV

# Set of Travel Characteristics

X_FAV = mat[:, [56,57]].astype("float32") # Familiarity with Autonomous Vehicles

X_SP = mat[:, 112:120].astype("float32") # SP Attribute Variables

X_TPEU = mat[:, 78:82].astype("float32") # Topics for Perceived Ease of Use

X_TPU = mat[:, 82:89].astype("float32") # Topics for Usefulness

X_TPSR = mat[:, 89:95].astype("float32") # Topics for Perceived Safety Risk

X_TPPR = mat[:, 95:101].astype("float32") # Topics for Perceived Privacy Risk

X_TTR = mat[:, 101:106].astype("float32") # Topics for Trust

X_TAT = mat[:, 106:112].astype("float32") # Topics for Attitudes

bern = mat[:, -2].astype("int") # Question_type

y = mat[:,-1].astype("int")

# Concatenating the variables

X_lk = np.concatenate([X_PEU, X_PU], axis=1)

X_oe = np.concatenate([X_TPEU, X_TPU], axis=1)

X_sd = np.concatenate([X_SD, X_TC, X_FAV, X_SP], axis=1)

return y, X_lk, X_oe, X_sd, bern

y, X_lk, X_oe, X_sd, bern = data_processing(df) # Processed data

bern1 = 1*(np.array([bern, bern]).transpose() == 1)

bern2 = 1*(np.array([bern, bern]).transpose() == 2)

bern3 = 1*(np.array([bern, bern]).transpose() == 3)

Proposed Model

Pyro Model for the Combined Version

def model(X_ls, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, obs=None):

# Coefficients for the LS model (Version_1)

alpha_ls1 = pyro.sample("alpha_ls1", dist.Normal(torch.zeros(K), torch.ones(K)))

gamma_ls1 = pyro.sample("gamma_ls1", dist.Normal(torch.zeros(X_ls.shape[1], K),

torch.ones(X_ls.shape[1], K)))

# Coefficients for the LS model (Version_2)

alpha_ls2 = pyro.sample("alpha_ls2", dist.Normal(torch.zeros(K), torch.ones(K)))

gamma_ls2 = pyro.sample("gamma_ls2", dist.Normal(torch.zeros(X_ls.shape[1], K),

torch.ones(X_ls.shape[1], K)))

# Coefficients for the OE model

alpha_oe = pyro.sample("alpha_oe", dist.Normal(torch.zeros(K), torch.ones(K)))

gamma_oe = pyro.sample("gamma_oe", dist.Normal(torch.zeros(X_oe.shape[1], K),

torch.ones(X_oe.shape[1], K)))

with pyro.plate("data", X_ls.shape[0], use_cuda=True):

y_att_ls1 = pyro.sample("y_att_ls1", dist.MultivariateNormal(alpha_ls1 + torch.matmul(X_ls,

gamma_ls1), torch.eye(K)), obs=None)

y_att_ls2 = pyro.sample("y_att_ls2", dist.MultivariateNormal(alpha_ls2 + torch.matmul(X_ls,

gamma_ls2), torch.eye(K)), obs=None)

y_att_oe = pyro.sample("y_att_oe", dist.MultivariateNormal(alpha_oe + torch.matmul(X_oe, gamma_oe),

torch.eye(K)), obs=None)

y_att = bern1 * y_att_ls1 + bern2 * y_att_ls2 + bern3 * y_att_oe

y_att = (y_att - torch.mean(y_att, dim=0))/torch.std(y_att, dim=0)

X_Data = torch.zeros(X_sd.shape[0], X_sd.shape[1] + 2)

X_Data[:, 0:-2] = X_sd

X_Data[:, -2] = y_att[:, 0]

X_Data[:, -1] = y_att[:, 1]

# Coefficients for the Classification model

alpha = torch.zeros(1, n_cat)

alpha_1 = pyro.sample("alpha_1", dist.Normal(torch.zeros(n_cat-1), torch.ones(n_cat-1)))

alpha[:, 1:] = alpha_1

beta = torch.zeros(X_Data.shape[1], n_cat)

beta_1 = pyro.sample("beta_1", dist.Normal(torch.zeros(X_Data.shape[1], n_cat-1),

torch.ones(X_Data.shape[1], n_cat-1)))

beta[:, 1:] = beta_1

with pyro.plate("data_final", X_Data.shape[0], use_cuda=True):

P a g e | XLVI

y = pyro.sample("y", dist.Categorical(logits= alpha + torch.matmul(X_Data, beta)), obs=obs)

return y

Preparing the tensors for the model (Proposed_Model)

n_cat = 3

K = 2

X_lk = torch.from_numpy(X_lk).float().cuda()

X_oe = torch.from_numpy(X_oe).float().cuda()

X_sd = torch.from_numpy(X_sd).float().cuda()

bern1 = torch.from_numpy(bern1).float().cuda()

bern2 = torch.from_numpy(bern2).float().cuda()

bern3 = torch.from_numpy(bern3).float().cuda()

y = torch.from_numpy(y).float().cuda()

Inference using SVI

%%time

# Define guide function

guide = AutoDiagonalNormal(model)

# Reset parameter values

pyro.clear_param_store()

# Define the number of optimization steps

n_steps = 4000

# Setup the optimizer

adam_params = {"lr": 0.01}

optimizer = ClippedAdam(adam_params)

# Setup the inference algorithm

elbo = Trace_ELBO(num_particles=3)

svi = SVI(model, guide, optimizer, loss=elbo)

# Do gradient steps

for step in range(n_steps):

elbo = svi.step(X_lk, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, y)

if step % 500 == 0:

print("[%d] ELBO: %.1f" % (step, elbo))

Upon convergence, we can use the Predictive class to extract samples from posterior:

from pyro.infer import Predictive

def summary(samples):

site_stats = {}

for k, v in samples.items():

site_stats[k] = {

"mean": torch.mean(v, 0),

"std": torch.std(v, 0),

"5%": v.kthvalue(int(len(v) * 0.05), dim=0)[0],

"95%": v.kthvalue(int(len(v) * 0.95), dim=0)[0],

}

return site_stats

predictive = Predictive(model, guide=guide, num_samples=2000,

return_sites=("alpha_ls1", "gamma_ls1", "alpha_ls2", "gamma_ls2", "alpha_oe", "gamma_oe",

"y_att_ls1", "y_att_ls2", "y_att_oe", "y_att", "alpha_1", "beta_1"))

samples = predictive(X_lk, X_oe, X_sd, n_cat, bern1, bern2, bern3, K, y)

pred_summary = summary(samples)

pred_summary.items()

predictions = pd.DataFrame({

"alpha_ls1" : pred_summary["alpha_ls1"],

"gamma_ls1" : pred_summary["gamma_ls1"],

"alpha_ls2" : pred_summary["alpha_ls2"],

"gamma_ls2" : pred_summary["gamma_ls2"],

"alpha_oe" : pred_summary["alpha_oe"],

"gamma_oe" : pred_summary["gamma_oe"],

"y_att_ls1" : pred_summary["y_att_ls1"],

P a g e | XLVII

"y_att_ls2" : pred_summary["y_att_ls2"],

"y_att_oe" : pred_summary["y_att_oe"],

"alpha_1" : pred_summary["alpha_1"],

"beta_1" : pred_summary["beta_1"]

})

predictions.head()

predictions.to_csv('coeff.csv')

We can now use the inferred posteriors to make predictions for the test set and compute the corresponding

accuracy:

Prediction Accuracy

Reading the files

dft1 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_1_Test_LDA.csv')

dft2 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_2_Test_LDA.csv')

dft3 = pd.read_csv('//mnt//md0//data_vishnu//Pyro_Code//Revision_1_LDA//Version_3_Test_LDA.csv')

y_v1, X_lk_v1, X_oe_v1, X_sd_v1, bern_v1 = data_processing(dft1)

y_v2, X_lk_v2, X_oe_v2, X_sd_v2, bern_v2 = data_processing(dft2)

y_v3, X_lk_v3, X_oe_v3, X_sd_v3, bern_v3 = data_processing(dft3)

# Coefficients for the LS model- Ver_LK

alpha_ls1 = samples["alpha_ls1"].cpu()

gamma_ls1 = samples["gamma_ls1"].cpu()

alpha_ls1_hat=np.array([np.mean(b, axis=0) for b in alpha_ls1.detach().numpy().T])

gamma_ls1_hat=np.array([np.mean(b, axis=1) for b in gamma_ls1.detach().numpy().T]).T

# Coefficients for the LS model- Ver_LKOE

alpha_ls2 = samples["alpha_ls2"].cpu()

gamma_ls2 = samples["gamma_ls2"].cpu()

alpha_ls2_hat=np.array([np.mean(b, axis=0) for b in alpha_ls2.detach().numpy().T])

gamma_ls2_hat=np.array([np.mean(b, axis=1) for b in gamma_ls2.detach().numpy().T]).T

# Coefficients for the OE model- Ver_OE

alpha_oe = samples["alpha_oe"].cpu()

gamma_oe = samples["gamma_oe"].cpu()

alpha_oe_hat=np.array([np.mean(b, axis=0) for b in alpha_oe.detach().numpy().T])

gamma_oe_hat=np.array([np.mean(b, axis=1) for b in gamma_oe.detach().numpy().T]).T

# Coefficients for the choice

alpha_1 = samples["alpha_1"].cpu()

beta_1 = samples["beta_1"].cpu()

alpha_1_hat=np.array([np.mean(b) for b in alpha_1.detach().numpy().T])

beta_1_hat=np.array([np.mean(b, axis=1) for b in beta_1.detach().numpy().T])

y_att_ls1 = samples["y_att_ls1"].cpu()

y_att_ls2 = samples["y_att_ls2"].cpu()

y_att_oe = samples["y_att_oe"].cpu()

y_att_ls1_hat=np.array([np.mean(b, axis=1) for b in y_att_ls1.detach().numpy().T]).T

y_att_ls2_hat=np.array([np.mean(b, axis=1) for b in y_att_ls2.detach().numpy().T]).T

y_att_oe_hat=np.array([np.mean(b, axis=1) for b in y_att_oe.detach().numpy().T]).T

bern1 = bern1.cpu()

bern2 = bern2.cpu()

bern3 = bern3.cpu()

y_att_hat = bern1.detach().numpy() * y_att_ls1_hat + bern2.detach().numpy() * y_att_ls2_hat +

bern3.detach().numpy() * y_att_oe_hat

np.savetxt("y_att_pred.csv", y_att_hat, delimiter=',')

Accuracy for Version_1 (Test Set)

y_hat_v1 = [None]*len(y_v1)

X_Data = np.zeros((X_sd_v1.shape[0], X_sd_v1.shape[1] + 2))

X_Data[:, 0:-2] = X_sd_v1

alpha_hat = np.zeros(n_cat)

alpha_hat[1:] = alpha_1_hat

beta_hat = np.zeros((n_cat, X_Data.shape[1]))

P a g e | XLVIII

beta_hat[1:, :] = beta_1_hat

y_att_v1 = [[None]*2]*len(y_v1)

for i in range(len(y_v1)):

y_att_v1[i] = alpha_ls1_hat + np.dot(X_lk_v1[i], gamma_ls1_hat)

X_Data[i, -2] = y_att_v1[i][0]

X_Data[i, -1] = y_att_v1[i][1]

y_hat_v1[i] = alpha_hat + np.dot(beta_hat, X_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_lk.csv', 'w+', newline ='')

# writing the data into the file

with file:

write = csv.writer(file)

write.writerows(y_hat_v1)

y_hat_v1 = np.argmax(y_hat_v1, axis=1)

print("predictions:", y_hat_v1)

print("true values:", y_v1)

print(np.unique(y_hat_v1))

print(np.unique(y_v1))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat_v1 == y_v1) / len(y_v1))

Accuracy for Version_2 (Test Set)

y_hat_v2 = [None]*len(y_v2)

X_Data = np.zeros((X_sd_v2.shape[0], X_sd_v2.shape[1] + 2))

X_Data[:, 0:-2] = X_sd_v2

y_att_v2 = [[None]*2]*len(y_v2)

for i in range(len(y_v2)):

y_att_v2[i] = alpha_ls2_hat + np.dot(X_lk_v2[i], gamma_ls2_hat)

X_Data[i, -2] = y_att_v2[i][0]

X_Data[i, -1] = y_att_v2[i][1]

y_hat_v2[i] = alpha_hat + np.dot(beta_hat, X_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_lkoe.csv', 'w+', newline ='')

# writing the data into the file

with file:

write = csv.writer(file)

write.writerows(y_hat_v2)

y_hat_v2 = np.argmax(y_hat_v2, axis=1)

print("predictions:", y_hat_v2)

print("true values:", y_v2)

print(np.unique(y_hat_v2))

print(np.unique(y_v2))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat_v2 == y_v2) / len(y_v2))

Accuracy for Version_3 (Test Set)

y_hat_v3 = [None]*len(y_v3)

X_Data = np.zeros((X_sd_v3.shape[0], X_sd_v3.shape[1] + 2))

X_Data[:, 0:-2] = X_sd_v3

y_att_v3 = [[None]*2]*len(y_v3)

for i in range(len(y_v3)):

y_att_v3[i] = alpha_oe_hat + np.dot(X_oe_v3[i], gamma_oe_hat)

X_Data[i, -2] = y_att_v3[i][0]

X_Data[i, -1] = y_att_v3[i][1]

y_hat_v3[i] = alpha_hat + np.dot(beta_hat, X_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_oe.csv', 'w+', newline ='')

# writing the data into the file

P a g e | XLIX

with file:

write = csv.writer(file)

write.writerows(y_hat_v3)

y_hat_v3 = np.argmax(y_hat_v3, axis=1)

print("predictions:", y_hat_v3)

print("true values:", y_v3)

print(np.unique(y_hat_v3))

print(np.unique(y_v3))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat_v3 == y_v3) / len(y_v3))

Accuracy for Version_1 (Training Set)

y, X_lk, X_oe, X_sd, bern = data_processing(dfv1)

y_hat = [None]*len(y)

X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))

X_tr_Data[:, 0:-2] = X_sd

alpha_tr_hat = np.zeros(n_cat)

alpha_tr_hat[1:] = alpha_1_hat

beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))

beta_tr_hat[1:, :] = beta_1_hat

y_att_tr = [[None]*2]*len(y)

for i in range(len(y)):

y_att_tr[i] = alpha_ls1_hat + np.dot(X_lk[i], gamma_ls1_hat)

X_tr_Data[i, -2] = y_att_tr[i][0]

X_tr_Data[i, -1] = y_att_tr[i][1]

y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_lk_tr.csv', 'w+', newline ='')

# writing the data into the file

with file:

write = csv.writer(file)

write.writerows(y_hat)

y_hat = np.argmax(y_hat, axis=1)

print("predictions:", y_hat)

print("true values:", y)

print(np.unique(y_hat))

print(np.unique(y))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))

Accuracy for Version_2 (Training Set)

y, X_lk, X_oe, X_sd, bern = data_processing(dfv2)

y_hat = [None]*len(y)

X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))

X_tr_Data[:, 0:-2] = X_sd

alpha_tr_hat = np.zeros(n_cat)

alpha_tr_hat[1:] = alpha_1_hat

beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))

beta_tr_hat[1:, :] = beta_1_hat

y_att_tr = [[None]*2]*len(y)

for i in range(len(y)):

y_att_tr[i] = alpha_ls2_hat + np.dot(X_lk[i], gamma_ls2_hat)

X_tr_Data[i, -2] = y_att_tr[i][0]

X_tr_Data[i, -1] = y_att_tr[i][1]

y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_lkoe_tr.csv', 'w+', newline ='')

# writing the data into the file

with file:

P a g e | L

write = csv.writer(file)

write.writerows(y_hat)

y_hat = np.argmax(y_hat, axis=1)

print("predictions:", y_hat)

print("true values:", y)

print(np.unique(y_hat))

print(np.unique(y))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))

Accuracy for Version_3 (Training Set)

y, X_lk, X_oe, X_sd, bern = data_processing(dfv3)

y_hat = [None]*len(y)

X_tr_Data = np.zeros((X_sd.shape[0], X_sd.shape[1] + 2))

X_tr_Data[:, 0:-2] = X_sd

alpha_tr_hat = np.zeros(n_cat)

alpha_tr_hat[1:] = alpha_1_hat

beta_tr_hat = np.zeros((n_cat, X_tr_Data.shape[1]))

beta_tr_hat[1:, :] = beta_1_hat

y_att_tr = [[None]*2]*len(y)

for i in range(len(y)):

y_att_tr[i] = alpha_oe_hat + np.dot(X_oe[i], gamma_oe_hat)

X_tr_Data[i, -2] = y_att_tr[i][0]

X_tr_Data[i, -1] = y_att_tr[i][1]

y_hat[i] = alpha_tr_hat + np.dot(beta_tr_hat, X_tr_Data[i])

# opening the csv file in 'w+' mode

file = open('utilities_comb_oe_tr.csv', 'w+', newline ='')

# writing the data into the file

with file:

write = csv.writer(file)

write.writerows(y_hat)

y_hat = np.argmax(y_hat, axis=1)

print("predictions:", y_hat)

print("true values:", y)

print(np.unique(y_hat))

print(np.unique(y))

# evaluate prediction accuracy

print("Accuracy:", 1.0*np.sum(y_hat == y) / len(y))

P a g e | LI

APPENDIX I

14 PYTHON CODE FOR THE MAPPING OF RESPONSES

import numpy as np

import pandas as pd

from sklearn.preprocessing import OneHotEncoder

np.random.seed(42)

Function for Data Pre-processing

def data_preprocessing(len_dataset, ques_size):

"""

def:

pre-processing of data

inp:

len_dataset- length of the dataset

ques_size- array with number of topics/levels per question

out:

dataset with onehotencoded variables

"""

### Creating the dummy Dataset

len_dataset = len_dataset

num_ques = len(ques_size)

X_Act = np.array(np.zeros((len_dataset, num_ques)))

for i in range(num_ques):

X_Act[:, i] = np.random.choice(a=ques_size[i], size=(len_dataset, 1), p=[1/ques_size[i]]*ques_size[i]).T

var_name = ['X_' + str(i + 1) for i in range(num_ques)]

X_df = pd.DataFrame(data=X_Act, columns=var_name)

### OneHotEncoding of the Likert scale responses

enc = OneHotEncoder(handle_unknown='ignore')

enc_name = [0]*num_ques

for k in range(num_ques):

enc_name[k] = pd.DataFrame(enc.fit_transform(X_df[[var_name[k]]]).toarray())

enc_name[k].columns = ['X_'+str(k+1)+'_' + str(i+1) for i in range(ques_size[k])]

if k > 0:

enc_name[k] = enc_name[k-1].join(enc_name[k])

return enc_name[num_ques-1]

Function for Gibbs Sampling

def discrete_gibbs(Data, gamma_df, gamma_inv, alpha_df, ques_size, n_iter, n_warm):

"""

def:

function that performs Gibbs Sampling for discrete variables

inp:

Data- Dataset

gamma_df- Dataset of coefficients

gamma_inv- Dataset of the inverse coefficients

alpha_df- Dataset of constants

ques_size- array with number of topics/levels per question

n_iter- number of iterations

n_warm- number of warmup iterations to be discarded

out:

Dataset with sampled values

"""

num_ques = np.sum(ques_size)

y_arr = ['y_att_1', 'y_att_2']

np_arr = np.array([[[0.000]*(n_iter+n_warm)]*num_ques]*len(Data))

for j in range(n_iter):

P a g e | LII

print("Iterations: ", j)

for index, row in Data.iterrows():

n_dims = 0

for k in range(len(ques_size)):

V_na = ['X_'+str(k+1)+'_' + str(i+1) for i in range(ques_size[k])]

V_nn = V_na + y_arr

fir_pa = np.array([row['y_att_1'], row['y_att_2']]) - alpha_df -

np.matmul(np.array(gamma_df.drop(columns=V_na)), np.array(row.drop(V_nn, axis=0)))

inv_mc = np.array(gamma_inv.loc[V_na, :].T)

x_value = np.matmul(fir_pa, inv_mc)

sum_val = np.sum(np.abs(x_value))

x_val = np.abs(x_value)/sum_val

x_val = np.random.dirichlet(x_val)

for m in range(ques_size[k]):

np_arr[index, n_dims + m, j] = x_val[m]

n_dims += ques_size[k]

if j == n_iter/4:

print("25% iteration complete")

elif j == n_iter/2:

print("50% iteration complete")

elif j == 3*n_iter/4:

print("75% iteration complete")

elif j == n_iter-1:

print("Iteration complete")

return np_arr

Execution

def dis_sample(Dataset, gamma_df, gamma_inv, alpha_df, ques_size, num_iter, num_warm):

"""

def:

Perform discrete Gibbs sampling on a dataset

inp:

Dataset containing y_variables

gamma_df- data frame with the coefficients

gamma_inv- data frame with the inverse of the coefficients

ques_size- array with the number of topics/levels per question

num_iter- total number of iterations

num_warm- number of warm-ups

out:

Sampled dataset

"""

len_dataset = len(Dataset)

Exp_Var = pd.DataFrame(data_preprocessing(len_dataset,ques_size))

Dataset = Dataset.join(Exp_Var)

# Gibbs Sampling

Dataset = discrete_gibbs(Dataset, gamma_df, gamma_inv, alpha_df, ques_size, num_iter, num_warm)

Data_1 = Dataset[:, :, num_warm:]

Data_2 = np.mean(Data_1, axis=2)

print(Data_2.shape)

X_Variables = np.mean(Data_2, axis=0)

print(X_Variables)

return Dataset

# For Likert scale questions, set values

num_questions = 9

ques_size = [5]*num_questions

Dataset = pd.read_csv("y_att_pred_v1.csv", header=None)

Dataset.columns = ['y_att_1', 'y_att_2']

print(len(Dataset))

gamma_v = pd.read_csv("Gamma_ls2.csv", header=None)

alpha_v = np.array([0.6820, -0.7710]) # Update values

P a g e | LIII

col_names = []

for j in range(num_questions):

col_names.extend(['X_'+str(j+1)+'_' + str(i+1) for i in range(ques_size[j])])

gamma_v.columns = col_names

for i in range(num_questions):

V_na = ['X_'+str(i+1)+'_' + str(j+1) for j in range(ques_size[i])]

gamma_v1 = gamma_v[V_na]

if i == 0:

gamma_inv = pd.DataFrame((gamma_v1.T).dot(np.linalg.inv(gamma_v1.dot(gamma_v1.T))))

gamma_inv.index = V_na

else:

gamma_inv1 = pd.DataFrame((gamma_v1.T).dot(np.linalg.inv(gamma_v1.dot(gamma_v1.T))))

gamma_inv1.index = V_na

gamma_inv = gamma_inv.append(gamma_inv1)

gamma_inv.columns = ['y_att_1', 'y_att_2']

Data = dis_sample(Dataset, gamma_v, gamma_inv, alpha_v, ques_size, 2000, 400)

Data_1 = Data[:, :, 50:]

print(Data_1.shape)

Data_2 = np.mean(Data_1, axis=2)

print(Data_2.shape)

X_Variables = np.mean(Data_2, axis=0)

print(X_Variables)

import csv

with open('Data_ls2_2000.csv', 'w', newline='') as csvfile:

writer = csv.writer(csvfile, delimiter=',')

writer.writerows(Data_2)

Do Open-ended Questions Influence the Measurement of Attitudes? An Investigation

Article

Jan 2024

Meanings of Loss Among Japanese Suicide Bereaved: Content Analysis of Open-Ended Responses

Article

Full-text available

Nov 2020
JPN PSYCHOL RES

Although the importance of meaning-making among suicide bereaved has been reported, the detailed contents of the process remain unclear. This study aimed to identify the content categories of sense-making and benefit-finding in Japanese suicide loss survivors. We conducted content analysis of responses to open-ended questions in 99 participants. The results indicated that sense-making activities comprised seven categories, including Deceased was relieved from pain and Suicide is inevitable in modern society. Benefit-finding also comprised eight categories, such as Treat others with compassion and Live one day at a time with gratitude. The implications of the results are discussed in terms of sociocultural contexts of suicide postvention.

Understanding Social Acceptability of Drivers for the Diffusion of Autonomous Vehicles in Japan

Article

Full-text available

Jan 2017

Autonomous vehicles (AVs) are expected to increase road safety and ensure that the mobility needs of elderly people are met. This study investigates the social acceptability for the diffusion of AVs in Japan. An Internet-based survey obtained results from 1,250 participants who were based in all the regions of Japan. Factor and cluster analyses were used to analyze the obtained data. The major findings suggest that respondents who totally disapproved of AVs diffusion were anxious about its potential negative impact on road safety. In contrast, respondents who totally approved of the technology’s diffusion into society felt they could use it in many different scenarios and that it would have positive social effects. Using our findings and by referring to previous methodology for promoting innovative products, we developed a number of policy recommendations that can be used to create social acceptability for the diffusion of AVs.

Discovering Latent Activity Patterns from Transit Smart Card Data: A Spatiotemporal Topic Model

Article

Full-text available

Mar 2020
TRANSPORT RES C-EMER

Although automatically collected human travel records can accurately capture the time and location of human movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically, the activity-travel episodes of an individual user are treated as words in a document , and each topic is a distribution over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and continuous attributes-the location, start time of day, start day of week, and duration of each activity episode. The proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results show that the model can successfully distinguish the three most basic types of activities-home, work, and other, and it fits the data significantly better than rule-based approaches. As the specified number of activity categories increases, more specific subpatterns for home and work emerge. This work makes it possible to enrich human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules.

How Do Nurses Cope with Shift Work? A Qualitative Analysis of Open-Ended Responses from a Survey of Nurses

Article

Full-text available

Oct 2019
Int J Environ Res Publ Health

Nurses are frequently required to engage in shift work given the 24/7 nature of modern healthcare provision. Despite the health and wellbeing costs associated with shift work, little is known about the types of coping strategies employed by nurses. It may be important for nurses to adopt strategies to cope with shift work in order to prevent burnout, maintain wellbeing, and ensure high quality care to patients. This paper explores common strategies employed by nurses to cope with shift work. A workforce survey was completed by 449 shift working nurses that were recruited from a major metropolitan health service in Melbourne, Australia. Responses to open-ended questions about coping strategies were analysed using the framework approach to thematic analysis. Four interconnected main themes emerged from the data: (i) health practices, (ii) social and leisure, (iii) cognitive coping strategies, and (iv) work-related coping strategies. Although a range of coping strategies were identified, sleep difficulties often hindered the effective use of coping strategies, potentially exacerbating poor health outcomes. Findings suggest that in addition to improving nurses’ abilities to employ effective coping strategies on an individual level, workplaces also play an important role in facilitating nurses’ wellbeing.

Open-Ended Versus Closed-Ended Responses: A Comparison Study Using Topic Modeling and Factor Analysis

Article

Dec 2020

Sentiment analysis of open-ended student feedback

Conference Paper

Sep 2020

What couples say about living and coping with sensory loss: a qualitative analysis of open-ended survey responses

Article

Dec 2020
DISABIL REHABIL

The current study reports the results of open-ended questions from a follow-up survey of adults with sensory loss and their spouses who had previously taken part in an online study. In total, 111 participants completed the survey (72 adults with a sensory loss and 39 spouses). Open-ended questions asked about the overall experience of living with sensory loss, sensory loss-related challenges, and support and coping mechanisms. Thematic analysis was used to identify dominant themes in participants’ responses. Three core themes capturing their overall experience emerged: (1) sensory loss-related challenges, (2) support and coping, and (3) adjustment and readjustment. Sensory loss was characterized as a challenging experience, causing communication and emotional disturbances. Coping strategies reported by both partners included the use of assistive technology, positive re-appraisal, acceptance and/or denial of the loss, while support strategies were mostly derived from the comments of spouses (for AWSLs), family members and peer networks (for both partners). Finally, respondents described sensory loss as an adventurous learning experience. Our findings underscore the significance of considering sensory loss from a social relational/family perspective and highlight the importance of addressing the needs of both adults with sensory loss and their partners in treatment and rehabilitation. • Implications for rehabilitation • Study highlights the need to consider sensory loss from a relational/family perspective. • Healthcare professionals should try to increase the involvement of significant others and close family members (e.g., spouses, parents, children) into the rehabilitation process. • Greater emphasis should be placed on exploring and reinforcing positive experiences and attitudes associated with sensory loss during counselling/rehabilitation sessions. • Improved education about sensory loss for both the general public and health care professionals could minimize the adverse outcomes associated with sensory loss.

Improving public services using artificial intelligence: possibilities, pitfalls, governance

Article

Sep 2020

PAUL W. FAY HENMAN

Artificial intelligence arising from the use of machine learning is rapidly being developed and deployed by governments to enhance operations, public services, and compliance and security activities. This article reviews how artificial intelligence is being used in public sector for automated decision making, for chatbots to provide information and advice, and for public safety and security. It then outlines four public administration challenges to deploying artificial intelligence in public administration: accuracy, bias and discrimination; legality, due process and administrative justice; responsibility, accountability, transparency and explainability; and power, compliance and control. The article outlines technological and governance innovations that are being developed to address these challenges.

Understanding residents’ perceptions of nature and local economic activities using an open-ended question before protected area designation in Amami Islands, Japan

Article

Jun 2020
J NAT CONSERV

For successful protected area (PA) management, it is essential to understand residents’ perceptions during the early phases of the designation process. However, most studies on residents’ perceptions have been conducted after PA designation due to the lack of researcher–policymaker cooperation. In this study, we reveal residents’ perceptions before the PA designation of Japan’s Amami Islands as a national park and a Natural World Heritage Site. We conducted a questionnaire survey using an open-ended question to collect textual answers on residents’ perceptions of nature and the local economic activities. We then categorized these answers into six topics by applying topic models and interpreted the topics qualitatively, indicating their content. We also examined the relativity of the topics and the islands using correspondence analysis. The residents were more interested in the landscapes relevant to their livelihoods and expected them to be managed. This result implies discrepancy between residents’ perceptions and the PA draft management plan because the draft plan mainly focuses on the conservation of biodiversity in subtropical rainforests, whereas residents were unfamiliar with this. With regard to the local economic activities, residents expected enhancement of agriculture and traditional craft industries and nature-based tourism. Furthermore, residents’ perceptions were probably influenced by the context of the islands on which they lived. We suggest adoption of suitable PA management and communication strategies for each island in view of residents’ perceptions. Our approach has enabled us to understand residents’ perceptions that have been disregarded through the PA designation process.

Validity and reliability of a self‐assessment scale for Dental and Oral Health (DOH) students’ perception of transferable skills in Australia

Article

Sep 2019

Background: The Australian Dental Council's (ADC) competency framework requires graduating dental practitioners to be competent in a number of transferable skills, which includes: Being scientifically versed, technically skilled, and capable of safe independent and team work, while adhering to high ethical standards. 1 Part of the role of dental educators is to ensure graduating students acquire requisite transferable skills, in line with regulatory requirements. 2 In order to achieve this, it is imperative to assess students' own understanding or perception of transferable skills requirement upon graduation. The objective of this study is to develop a valid and reliable scale for this assessment. Method: A cohort of students drawn across three different dental programmes: Undergraduate Dentistry (year 1-3); Postgraduate Dentistry (year 4-5); Bachelor of Dental Technology/Prosthesis, participated in this study. A self-assessment questionnaire containing relevant open- and closed-ended questions was administered. The questionnaire assessed students' perception of transferable skills for their future career, and attitude towards learning and developing transferable skills. Result: In total we successfully assessed 388 of the 391 students sampled (99.2 % response rate), their mean age was 24.3 years (SD ± 5.7), 53.3% were females, while 46.7% were males. Overall, Exploratory Factor Analysis (EFA) extracted five factors for students' perception of current skill level, and four factors for future skills requirements. The factor structures were confirmed using Confirmatory Factor Analysis (CFA), the structure had a good model fit and high levels of reliability, with respect to individual dimension and content validity. Conclusions: The structure derived from the transferable skills survey administered to a cohort of dental students, suggests that the transferable skill survey can be utilised as a valid and reliable screening tool to test students' perception of transferable skill requirements.

Automated Text Analysis on Open-Ended Response Surveys: Measuring Attitudes Regarding Autonomous Vehicles

Abstract and Figures

Recommended publications

Open vs closed ended questions in attitudinal surveys comparing combining and interpreting using nat...

Open-Ended Versus Closed-Ended Responses: A Comparison Study Using Topic Modeling and Factor Analysi...

Open vs Closed-ended questions in attitudinal surveys -- comparing, combining, and interpreting usin...

What drives the acceptance of autonomous driving? An investigation of acceptance factors from an end...