Fig 1 - uploaded by Mao Shan
Content may be subject to copyright.
An example of the skeleton representation obtained using the OpenPose library. (a) shows points of the skeleton. (b) illustrates the skeleton and the corresponding angles. 

An example of the skeleton representation obtained using the OpenPose library. (a) shows points of the skeleton. (b) illustrates the skeleton and the corresponding angles. 

Source publication
Conference Paper
Full-text available
The estimation and prediction of pedestrian motion is of fundamental importance in ITS applications. Most existing solutions have utilized a particular type of sensor for perception such as cameras (stereo, monocular, infrared) or other modalities such as a laser range finder or radar. The advent of wearable devices with inertial sensors have led t...

Contexts in source publication

Context 1
... main points of interest in the skeleton representation are the arms and legs. Arms have 3 points that match the three main joints: shoulder, elbow and wrist (points 2-3-4 and 5-6-7 in Figure 1a). The angles and angular velocities between the horizontal line, as shown in Figure 1b, and the shoulder-elbow section and elbow-wrist section give information related to the action of the pedestrian. A single pedestrian performing different activities can be seen in Figure 7. This figure shows the angles of different arm joints extracted from the skeleton while a pedestrian walks in front of a car from t = 0 s to t = 2 s. Then the person stops until t = 3 s and starts to walk again afterwards. The standing time (from t = 2 s to t = 3 s) can be recognized through the lack of angle change in each of the arm segments. The blue line shows the angle of the shoulder-elbow segment with respect to the horizontal line. The average of this angle is around 90 degrees and its variation is only a few degrees. The red line shows the angle of the elbow-wrist, from which one can clearly differentiate between walking and standing, and also find out the traveling direction of the person. In this case, according to our angle assumption shown in Figure 1b, the person can be determined to be walking from left to right within the image frame of reference, because the average is less than 90 degrees. When the person is moving from right to left, the average of this signal is over 90 degrees. The yellow line is the angle of a imaginary line that joins the shoulder and the wrist. The purple line is the angle of the elbow, composed by the shoulder-elbow and elbow-wrist segments. During the time the pedestrian is standing, the arms of the person were still moving a small amount because they could not decelerate to zero velocity ...
Context 2
... main points of interest in the skeleton representation are the arms and legs. Arms have 3 points that match the three main joints: shoulder, elbow and wrist (points 2-3-4 and 5-6-7 in Figure 1a). The angles and angular velocities between the horizontal line, as shown in Figure 1b, and the shoulder-elbow section and elbow-wrist section give information related to the action of the pedestrian. A single pedestrian performing different activities can be seen in Figure 7. This figure shows the angles of different arm joints extracted from the skeleton while a pedestrian walks in front of a car from t = 0 s to t = 2 s. Then the person stops until t = 3 s and starts to walk again afterwards. The standing time (from t = 2 s to t = 3 s) can be recognized through the lack of angle change in each of the arm segments. The blue line shows the angle of the shoulder-elbow segment with respect to the horizontal line. The average of this angle is around 90 degrees and its variation is only a few degrees. The red line shows the angle of the elbow-wrist, from which one can clearly differentiate between walking and standing, and also find out the traveling direction of the person. In this case, according to our angle assumption shown in Figure 1b, the person can be determined to be walking from left to right within the image frame of reference, because the average is less than 90 degrees. When the person is moving from right to left, the average of this signal is over 90 degrees. The yellow line is the angle of a imaginary line that joins the shoulder and the wrist. The purple line is the angle of the elbow, composed by the shoulder-elbow and elbow-wrist segments. During the time the pedestrian is standing, the arms of the person were still moving a small amount because they could not decelerate to zero velocity ...
Context 3
... main points of interest in the skeleton representation are the arms and legs. Arms have 3 points that match the three main joints: shoulder, elbow and wrist (points 2-3-4 and 5-6-7 in Figure 1a). The angles and angular velocities between the horizontal line, as shown in Figure 1b, and the shoulder-elbow section and elbow-wrist section give information related to the action of the pedestrian. A single pedestrian performing different activities can be seen in Figure 7. This figure shows the angles of different arm joints extracted from the skeleton while a pedestrian walks in front of a car from t = 0 s to t = 2 s. Then the person stops until t = 3 s and starts to walk again afterwards. The standing time (from t = 2 s to t = 3 s) can be recognized through the lack of angle change in each of the arm segments. The blue line shows the angle of the shoulder-elbow segment with respect to the horizontal line. The average of this angle is around 90 degrees and its variation is only a few degrees. The red line shows the angle of the elbow-wrist, from which one can clearly differentiate between walking and standing, and also find out the traveling direction of the person. In this case, according to our angle assumption shown in Figure 1b, the person can be determined to be walking from left to right within the image frame of reference, because the average is less than 90 degrees. When the person is moving from right to left, the average of this signal is over 90 degrees. The yellow line is the angle of a imaginary line that joins the shoulder and the wrist. The purple line is the angle of the elbow, composed by the shoulder-elbow and elbow-wrist segments. During the time the pedestrian is standing, the arms of the person were still moving a small amount because they could not decelerate to zero velocity ...
Context 4
... are also divided into three junctions: hip, knee and ankle for each leg (points 8-9-10 and 11-12-13 on Figure 1a). The variation of the knee angle indicates activities of walking and stepping up or down the sidewalk. These are important activities to recognize, in particular whether or not the pedestrian is stationary. This state is observed if the angle between hip-knee right and left does not changed. This case is shown in Figure 9, where the angles between different segments of the legs are presented. The top two plots show the angles of the segment hip-knee and knee- ankle, respectively, with respect to the horizontal line for both legs. The standing time is clearly visible between t = 2 s and t = 3 s, during which the angles of two legs are 90 degrees, meaning both legs are not moving. The bottom left plot presents the angles of the hip-ankle imaginary segment, which show clear differences between the time spent walking and standing. The bottom right plot shows the angle of the knee, composed by the intersection of the hip-knee and knee- ankle segments. Figure 10 shows the angular velocity of the knee-ankle segment. This provides very clear information to discrim- inate between a pedestrian walking or standing still. The signal pattern during walking is clearly distinguishable as the person moves a foot forward, while during standing the angular velocity is found zero. The linear velocity of a pedestrian can be obtained by processing the linear acceleration of the limbs. Also, linear acceleration provides more information related to the state of the body, particularly, if it is starting to move from a resting state, as shown at time t = 3 s in Figure 11. Previous research such as [22], uses this kind of information to calculate the length of the leg using an inverted pendulum model. An analysis of the utilization of the proposed approach has been presented in [23], where the authors use an IMU attached to the ankle to recognize the cycle of a stride. The foot forward creates a distinctive pattern for recognizing, according to the authors, six different activities including slow, normal and fast walking, running, climbing and de- scending stairs. The results obtained with our approach using only vision are comparable with those in [23] using inertial sensors. ...
Context 5
... are also divided into three junctions: hip, knee and ankle for each leg (points 8-9-10 and 11-12-13 on Figure 1a). The variation of the knee angle indicates activities of walking and stepping up or down the sidewalk. These are important activities to recognize, in particular whether or not the pedestrian is stationary. This state is observed if the angle between hip-knee right and left does not changed. This case is shown in Figure 9, where the angles between different segments of the legs are presented. The top two plots show the angles of the segment hip-knee and knee- ankle, respectively, with respect to the horizontal line for both legs. The standing time is clearly visible between t = 2 s and t = 3 s, during which the angles of two legs are 90 degrees, meaning both legs are not moving. The bottom left plot presents the angles of the hip-ankle imaginary segment, which show clear differences between the time spent walking and standing. The bottom right plot shows the angle of the knee, composed by the intersection of the hip-knee and knee- ankle segments. Figure 10 shows the angular velocity of the knee-ankle segment. This provides very clear information to discrim- inate between a pedestrian walking or standing still. The signal pattern during walking is clearly distinguishable as the person moves a foot forward, while during standing the angular velocity is found zero. The linear velocity of a pedestrian can be obtained by processing the linear acceleration of the limbs. Also, linear acceleration provides more information related to the state of the body, particularly, if it is starting to move from a resting state, as shown at time t = 3 s in Figure 11. Previous research such as [22], uses this kind of information to calculate the length of the leg using an inverted pendulum model. An analysis of the utilization of the proposed approach has been presented in [23], where the authors use an IMU attached to the ankle to recognize the cycle of a stride. The foot forward creates a distinctive pattern for recognizing, according to the authors, six different activities including slow, normal and fast walking, running, climbing and de- scending stairs. The results obtained with our approach using only vision are comparable with those in [23] using inertial sensors. ...
Context 6
... are also divided into three junctions: hip, knee and ankle for each leg (points 8-9-10 and 11-12-13 on Figure 1a). The variation of the knee angle indicates activities of walking and stepping up or down the sidewalk. These are important activities to recognize, in particular whether or not the pedestrian is stationary. This state is observed if the angle between hip-knee right and left does not changed. This case is shown in Figure 9, where the angles between different segments of the legs are presented. The top two plots show the angles of the segment hip-knee and knee- ankle, respectively, with respect to the horizontal line for both legs. The standing time is clearly visible between t = 2 s and t = 3 s, during which the angles of two legs are 90 degrees, meaning both legs are not moving. The bottom left plot presents the angles of the hip-ankle imaginary segment, which show clear differences between the time spent walking and standing. The bottom right plot shows the angle of the knee, composed by the intersection of the hip-knee and knee- ankle segments. Figure 10 shows the angular velocity of the knee-ankle segment. This provides very clear information to discrim- inate between a pedestrian walking or standing still. The signal pattern during walking is clearly distinguishable as the person moves a foot forward, while during standing the angular velocity is found zero. The linear velocity of a pedestrian can be obtained by processing the linear acceleration of the limbs. Also, linear acceleration provides more information related to the state of the body, particularly, if it is starting to move from a resting state, as shown at time t = 3 s in Figure 11. Previous research such as [22], uses this kind of information to calculate the length of the leg using an inverted pendulum model. An analysis of the utilization of the proposed approach has been presented in [23], where the authors use an IMU attached to the ankle to recognize the cycle of a stride. The foot forward creates a distinctive pattern for recognizing, according to the authors, six different activities including slow, normal and fast walking, running, climbing and de- scending stairs. The results obtained with our approach using only vision are comparable with those in [23] using inertial sensors. ...
Context 7
... work makes use of the OpenPose library developed by [9]- [11]. This tool is used to derive skeleton representa- tions of pedestrians using a sequence of images. The library uses particular models to extract 18 points of the body including shoulder, elbow, wrist, hip, knee, ankle, etc., as shown in Figure 1a. This algorithm has been demonstrated to perform robustly even with a crowd of pedestrians in a single image, which is an essential requirement for pedestrian tracking in an urban ...

Similar publications

Article
Full-text available
Among the serious experimental uses given to smartphones in physics, the use of integrated sensors, video analysis, and mixed uses as simultaneous material and instruments stand out for their frequency. Regarding the use of integrated sensors, there are many proposals that use the accelerometer, the magnetometer, or other sensors (e.g, camera or mi...
Article
Full-text available
In this paper, a new radar-camera fusion system is presented. The fusion system takes consideration of error bounds of the two different coordinate systems from the inhomogeneous sensors, and further designed a new extended fusion-Kalman filter to adapt to the two sensors. The application details such as unsynchronization between sensors, multi-tar...
Article
Full-text available
The zero-velocity update (ZUPT) method has become a popular approach to estimate foot kinematics from foot worn inertial measurement units (IMUs) during walking and running. However, the accuracy of the ZUPT method for stride parameters at sprinting speeds remains unknown, specifically when using sensors with characteristics well suited for sprinti...
Article
Full-text available
We have designed a smart wearable device to protect actively pedestrian from impact of vehicle. This device consists of several modules, including radar sensor, transmission module, alarm module and intelligent security program module. In the dominant program module, the safety intelligent algorithm based on fuzzy comprehensive evaluation and BP ne...
Preprint
Full-text available
Semantic Segmentation (SS) is a task to assign semantic label to each pixel of the images, which is of immense significance for autonomous vehicles, robotics and assisted navigation of vulnerable road users. It is obvious that in different application scenarios, different objects possess hierarchical importance and safety-relevance, but conventiona...

Citations

... Therefore, as shown in Table 1, the pedestrian's pose also becomes an indispensable factor in pedestrian intention or trajectory prediction [4]. Moreover, in recent studies, pedestrian pose is increasingly used as an input factor for predicting pedestrian intentions or trajectories [23]. ...
Article
Full-text available
Pedestrians who suddenly cross the street from within the blind spot of a vehicle’s field of view can pose a significant threat to traffic safety. The dangerous pedestrian crossing intentions in view-obscured scenarios have not received as much attention as the prediction of pedestrian crossing intentions. In this paper, we present a method for recognizing and predicting the dangerous crossing intention of pedestrians in a view-obscured region based on the interference, pose, velocity observation–long short-term memory (IPVO-LSTM) algorithm from a road-based view. In the first step, the road-based camera captures the pedestrian’s image. Then, we construct a pedestrian interference state feature module, pedestrian three-dimensional pose feature module, pedestrian velocity feature module, and pedestrian blind observation state feature module and extract the corresponding features of the studied pedestrians. Finally, the pedestrian hazard crossing intention prediction module based on a feature-fused LSTM (ff-LSTM) and attention mechanism is used to fuse and process the above features in a cell state process to recognize and predict the pedestrian hazard crossing intention in the blind visual area. Experiments are compared with current common algorithms in terms of the input parameter selection, intention recognition algorithm, and intention prediction time range, and the experimental results validate our state-of-the-art method.
... In [10] demonstrates that these vision based systems are capable of obtaining a dynamic representation of a pedestrian with information of similar quality to the inertial sensors used in wearable devices. This can enable the implementation of robust pedestrian intention algorithms based on vision information. ...
... According to our experience and testing, and taking into account that the size of the skeleton varies as the pedestrian and vehicle approach between them, the most stable points provided by OpenPose are the neck and the left and right hip. The rest of the fifteen points are linked to them in equations (10) and (11) where g is equal to three. This scheme generates a more simple and efficient filter. ...
Conference Paper
Full-text available
An essential task to prevent pedestrian injuries by an autonomous vehicle is the ability to correctly detect and predict its movement. A deep learning-based 2D human poses detector, as OpenPose, provides a skeleton of people present in an image captured by cameras mounted in the car. Nevertheless, these kinds of algorithms give a frame solution but do not capture the movement between them. Then, parts of the body are missed or the skeleton leaped to another part of the image where the infrastructure resembles a person. In this context, an algorithm based in the Kalman Filter algorithm to estimate the real skeleton including correlations in time and between parts of the body is presented. The algorithm was tested on videos using data provided by a vehicle moving in real scenarios. Results are presented that shown the capability of the algorithm to correct the mentioned loss of tracking.
... The body keypoints can be seen in Fig. 2(a) and the hand keypoints can be seen in Fig. 2(b). [52]. (b) Set of 21 hand points generated by OP [51] In this case, we only used the 18 body key points, 21 left-hand key points and 21 right-hand key points (60 key points in total). ...
Chapter
Providing teachers with detailed feedback about their gesticulation in class requires either one-on-one expert coaching, or highly trained observers to hand code classroom recordings. These methods are time consuming, expensive and require considerable human expertise, making them very difficult to scale to large numbers of teachers. Applying Machine Learning and Image processing we develop a non-invasive detector of teachers’ gestures. We use a multi-stage approach for the spotting task. Lessons recorded with a standard camera are processed offline with the OpenPose software. Next, using a gesture classifier trained on a previous training set with Machine Learning, we found that on new lessons the precision rate is between 54 and 78%. The accuracy depends on the training and testing datasets that are used. Thus, we found that using an accessible, non-invasive and inexpensive automatic gesture recognition methodology, an automatic lesson observation tool can be implemented that will detect possible teachers’ gestures. Combined with other technologies, like speech recognition and text mining of the teacher discourse, a powerful and practical tool can be offered to provide private and timely feedback to teachers about communication features of their teaching practices.
... To date, only a few works specifically consider pedestrian crossing intention as a cause of future action for early behavior prediction [3], [4]. This is different from other uses of the term 'pedestrian intention' in the literature, where it refers to the prediction or classification of future pedestrian behaviors or actions [5], [6], [7], [8], [9], [10]. In the robotics community, intentions of pedestrians are frequently modeled as reachable destinations or goals derived from the map of the local area [11], [12], [13], [14], [15]. ...
... To detect the patient joints a pose estimation algorithm was necessary, since the Tango SDK does not implement any pose estimation method. For that purpose we use OpenPose [3,4], which is a open source real-time multi-person posture detection software based on convolutional neural networks, has been chosen to be used in this work, mainly for its various recent applications [9,22,23,27]. ...
... Using the Figure 6 for reference the hip center (keypoint 14) keypoint, when not provided by the pose estimation algorithm is given by the middle point between the left hip and the right hip. Figure 8: (a) Keypoints that can be detected by the OpenPose algorithm [3]. Image adapted from Santiago et al [9]. (b) Keypoints that can be detected by the [29]. ...
Conference Paper
Full-text available
An ergonomic evaluation is an observation of a person in order to identify musculoskeletal disorders (WMSDs) caused by prolonged or repeated harmful poses that a person adopts during work tasks. Nowadays, an ergonomist or other health professional perform such evaluations based on a set of posture rules and checklists, which can be subjective and thus lead to erroneous risk classifications. Moreover, this professional usually perform such evaluation in the patient work environment. In order to make those evaluations more objective and concise we propose a evaluation method using a mobile depth sensor. Different from other methods based in fixed depth sensors (e.g. Kinect), our method enable professionals easily perform it in the patient work environment. More precisely, we present an experiment that uses a smartphone from Google's Tango project and the Ovako Working Posture Analysing System (OWAS) method. To evaluate our approach, we also perform the ergonomic assessment using the Kinect sensor, a device that has a good reliability in the automated ergonomic evaluation. Both evaluations involved a set of 34 poses performed by 3 volunteers and annotated by an ergonomist. The Kinect has achieved accuracy of 57,08% on torso classification, 58,33% for arms and 25,00% for legs positions. While the approach using the mobile depth sensor has achieved 35,41% on torso classification, 93,05% for arms and 66,23% for legs positions on the same set of poses. Although the small sample, the achieved results may indicate that our mobile depth sensor approach can be as viable as methods based fixed depth sensor.
Article
Self-driving cars not only solve the problem of navigating safely from location A to location B; they also have to deal with an abundance of (sometimes unpredictable) factors, such as traffic rules, weather conditions, and interactions with humans. Over the last decades, different approaches have been proposed to design intelligent driving systems for self-driving cars that can deal with an uncontrolled environment. Some of them are derived from computationalist paradigms, formulating mathematical models that define the driving agent, while other approaches take inspiration from biological cognition. However, despite the extensive work in the field of self-driving cars, many open questions remain. Here, we discuss the different approaches for implementing driving systems for self-driving cars, as well as the computational paradigms from which they originate. In doing so, we highlight two key messages: First, further progress in the field might depend on adapting new paradigms as opposed to pushing technical innovations in those currently used. Specifically, we discuss how paradigms from cognitive systems research can be a source of inspiration for further development in modelling driving systems, highlighting emergent approaches as a possible starting point. Second, self-driving cars can themselves be considered cognitive systems in a meaningful sense, and are therefore a relevant, yet underutilized resource in the study of cognitive mechanisms. Overall, we argue for a stronger synergy between the fields of cognitive systems and self-driving vehicles.
Article
Pedestrians’ red-light crossing can present a threat to traffic safety. Among all the existing work related to pedestrian’s red-light crossing, there are few studies using trajectory data in time sequence. This paper uses pose estimation (keypoint detection) to generate pedestrians’ variables from CCTV videos. Four machine learning models are used to predict pedestrians’ crossing intention at intersections’ red-light. The best model achieves an accuracy of 0.920 and AUC value of 0.849, with data from three intersections. Different prediction horizons (up to 4 sec) are used. With longer prediction horizons, the sample size gets smaller, which partially leads to worse model performance. However, the performance with prediction horizon up to 2 sec is still good (AUC value as 0.841). It is found that keypoint variables such as the angles between ankle and knee (left side) and elbow and shoulder (right side) are important. This model can be further implemented in the Infrastructure-to-Vehicle (I2V) applications and thus prevent accidents due to pedestrians’ red-light crossing by issuing warnings to drivers.
Conference Paper
Wearable devices have inertial sensors that provide useful information to estimate and predict pedestrian motion and intention, which is of fundamental importance in ITS applications. These devices are usually placed in the limbs, such as wrist, ankles and feet and they provide rotation rate and acceleration information. This information is essential for the successful development of systems capable of inferencing pedestrian intentions. Unfortunately these devices do not have the capabilities to broadcast information to all vehicles in proximity and require all pedestrian to be retrofitted with such capability. This is the fundamental reason why all existing approaches are based on sensing installed directly in the vehicles. Intelligent vehicles have different types of sensors to perceive the environment in proximity, the most common being cameras. This work demonstrates that vision from cameras is capable of obtaining pedestrian dynamics with similar accuracy of wearables devices. It compares rotation ratios and acceleration obtained with wearables installed in pedestrian wrists with similar information obtained by vision. The vision dynamic information is obtained using robust methods that combine skeleton representation with semantic information. The experimental results presented demonstrate the strong correlation between the wearable measured and vision observed rates and acceleration information. The outcomes of this work will enable the solution of one of the fundamental issues in pedestrian safety that is inference of intent.