Architecture diagram of our unified CLUE model.

Context 1

... our novel modelling architecture depicted in Figure 1, from the text transcript, we predict í µí± 1 by random forest and í µí± 2 by BERT model, where í µí±1, í µí±2, í µí±3, í µí±4, í µí±5 is the output of text-based emotion. Using the audio feature we predict í µí± 3 based on the probability of í µí±1, í µí±2, ..., í µí±8 which represents speech-based emotion. ...

View in full-text

Context 2

... fine-tuning, we freeze some of the layers and fine-tune only specific layers which are needed for our task, for instance, in the pre-trained text language model, we only fine-tune the contextual layers, mainly, layer 12. Our framework is depicted in Figure 1 where we extract audio from video, and audio extraction of the speech to text is performed using the IBM Watson speech to text platform. After a speech to text, we have extracted 13 features based on their continued use in studies [3,9,15,27,37]. ...

View in full-text

Context 3

... engagement score shows the impact of individual model on the final output. Figure 10 shows the variation of speech-based emotion over the video length where "Happy", "Surprised", "Neutral", "Fear" are dominant. To generate this figure, we extracted 10 secs of speech with a moving window of 10 secs and a hop of 10 secs as well. ...

View in full-text

Context 4

... the model prediction probability for every emotion was used for plotting. Similarly, Figure 11 shows the variation of speech-based emotion over the other video where "Anger", "Sad", "Disgust", "Neutral" are dominant. ...

View in full-text

Context 5

... is also observed that variation of positive emotion increases engagement compared to negative emotion. Figure 10 shows the variation in emotion of speech over time where "Happy", "Surprised", "Neutral", "Fear" are dominant and Figure 11 shows the variation in emotion of speech over time where "Anger", "Sad", "Disgust", "Neutral" are dominant. Engagement score of video, Figure 10, were significantly better than the video in Figure 11. ...

View in full-text

Context 6

... í µí± 3 , which is based on emotion decoding over speech reduced the predicted engagement score significantly. It is also observed that variation of positive emotion increases engagement compared to negative emotion. Figure 10 shows the variation in emotion of speech over time where "Happy", "Surprised", "Neutral", "Fear" are dominant and Figure 11 shows the variation in emotion of speech over time where "Anger", "Sad", "Disgust", "Neutral" are dominant. Engagement score of video, Figure 10, were significantly better than the video in Figure 11. ...

View in full-text

Context 7

... is also observed that variation of positive emotion increases engagement compared to negative emotion. Figure 10 shows the variation in emotion of speech over time where "Happy", "Surprised", "Neutral", "Fear" are dominant and Figure 11 shows the variation in emotion of speech over time where "Anger", "Sad", "Disgust", "Neutral" are dominant. Engagement score of video, Figure 10, were significantly better than the video in Figure 11. Variation in the emotion of speech over time helps to increase the engagement score than having a single emotion tone for a longer period. ...

View in full-text

Context 8

... is also observed that variation of positive emotion increases engagement compared to negative emotion. Figure 10 shows the variation in emotion of speech over time where "Happy", "Surprised", "Neutral", "Fear" are dominant and Figure 11 shows the variation in emotion of speech over time where "Anger", "Sad", "Disgust", "Neutral" are dominant. Engagement score of video, Figure 10, were significantly better than the video in Figure 11. Variation in the emotion of speech over time helps to increase the engagement score than having a single emotion tone for a longer period. ...

View in full-text

Architecture diagram of our unified CLUE model.

Contexts in source publication