ArticlePDF Available

The YouTube Social Network


Abstract and Figures

Today, YouTube is the largest user-driven video con-tent provider in the world; it has become a major plat-form for disseminating multimedia information. A ma-jor contribution to its success comes from the user-to-user social experience that differentiates it from tradi-tional content broadcasters. This work examines the so-cial network aspect of YouTube by measuring the full-scale YouTube subscription graph, comment graph, and video content corpus. We find YouTube to deviate sig-nificantly from network characteristics that mark tradi-tional online social networks, such as homophily, re-ciprocative linking, and assortativity. However, compar-ing to reported characteristics of another content-driven online social network, Twitter, YouTube is remarkably similar. Examining the social and content facets of user popularity, we find a stronger correlation between a user's social popularity and his/her most popular con-tent as opposed to typical content popularity. Finally, we demonstrate an application of our measurements for classifying YouTube Partners, who are selected users that share YouTube's advertisement revenue. Results are motivating despite the highly imbalanced nature of the classification problem.
Content may be subject to copyright.
A preview of the PDF is not available
... Автори статті [7] аналізують структуру інфраструктури YouTube, включаючи систему зворотного зв'язку з користувачами, яка дозволяє глядачам взаємодіяти з відео, надавати оцінки та коментарі. Вони вважають, що ця система зворотного зв'язку має вирішальне значення для розуміння соціальної динаміки платформи та надання рекомендацій користувачам. ...
... With a staggering user base of 2.7 billion individuals, it stands as the second most popular online platform, offering an extensive array of content categories, including educational videos, vlogs, news, unboxing presentations, gaming content, and more. Wattenhofer et al. investigate YouTube's social network dynamics, analyzing its subscription graph, comment graph, and video content corpus, revealing deviations from traditional online social networks and striking similarities to Twitter [47]. Tufekci [45] highlights the problem with YouTube's recommendation algorithm that leads users to extremist content regardless of their initial viewing preferences, thereby potentially increasing political polarization and promoting divisive ideologies. ...
Conference Paper
In the modern era, we find ourselves immersed in an ever-expanding flow of data where data is increasing exponentially. Data is generated from different platforms like Education, Business, E-commerce, and predominantly, social media platforms such as Twitter, YouTube, Facebook, and Instagram. Amidst this proliferation of content, user comments have emerged as a crucial element, serving as a platform for expressions of opinions, commendations, and critiques. However, within the abundance of user feedback lies a persistent issue: the presence of undesirable comments that elicit negative emotional responses and prove to be tedious and irrelevant. Effectively identifying and removing such comments poses a major challenge. This research addresses the imperative need for a robust comment classification model. To tackle this issue, a comprehensive investigation is conducted, employing a variety of machine learning models, including Decision Trees, Random Forests (RF), Naive Bayes, K-Nearest Neighbors, Gradient Boosting, AdaBoost, Logistic Regression, and Support Vector Machines (SVM) for comment classification. Furthermore, fundamental voting techniques such as Hard-Voting, Averaging, and Soft-Voting are incorporated with machine learning models to improve the classification performance. The objective is to discern the characteristics of text comments, classifying them, with the aim of achieving superior accuracy compared to prior research. In this paper, we propose a robust ensemble model, RF+AdaBoost+SVM+Soft-Voting, specifically designed for comment classification. The results obtained indicate that the proposed ensemble model achieved an impressive accuracy of approximately 98% for comment classification on YouTube dataset.
Introducción: El objetivo planteado es identificar los indicadores que promueven la creación, el mantenimiento y el crecimiento de una comunidad de usuarios en torno a los canales de youtubers. Metodología: Se plantea un análisis de contenido de carácter exploratorio sobre una muestra de 100 vídeos, 10 de cada uno de los 10 canales con más suscriptores del ranking Socialblade en el período comprendido entre el 14 de septiembre de 2018 al 22 de febrero de 2019. Resultados: El estudio permite identificar cuatro estrategias de gestión de comunidades desplegadas por los youtubers: fidelización de la audiencia, ampliación de comunidad, generación de expectación sobre futuros contenidos y la gestión particular de la pestaña Comunidad del canal. Discusión y conclusiones: Se revelan indicadores y prácticas válidas para comprender el fenómeno de las comunidades de youtubers, un fenómeno de masas con tan solo 15 años de existencia y un crecimiento imparable.
This study examines the search behavior of Gen Zs on YouTube and TikTok. It uses Purposive sampling to choose 10 participants who are Gen Zs, individuals ranging from 11-26 years old, residing at Barangay Bagumbayan of Santa Cruz, Laguna. It is only limited to Gen Zs who are young achievers and young professionals, and are using both YouTube and TikTok. It utilizes Uses and Gratification Theory to explore the search experiences of Gen Zs. The researchers conduct face-to-face semi-structured interviews to gather data, followed by thematic approach with themes derived from the transcripts of the interviews. It aims to understand the search purposes, predetermined intentions, and shifts in preference among Gen Z users. The findings revealed that informational, navigational, and transactional are the search purposes in utilizing YouTube and TikTok. Gen Zs have different predetermined intentions when maximizing those platforms. These predetermined intentions encompass completeness of information, convenience, engagement, legitimacy, and viewer preference. It was also found that the reasons Gen Zs in shifting their preference in utilizing content from YouTube and TikTok are retention, validity of information, exposure, and valuable, and usefulness.
Full-text available
Spor yöneticilerinin, sporu takip eden bireylerin istek ve ihtiyaçları doğrultusunda adım atarak rekabet üstünlüğü elde edebilmesi için stratejik hareket etmeleri gerekmektedir. Bu nedenle geniş kitleler tarafından takip edilen sosyal medyadaki spor kulüpleri takipçilerinin motivasyonlarını belirlemek önemlidir. Bu motivasyonların anlaşılması, spor kulüplerindeki yöneticilerin stratejik seçim yaklaşımını (SSY) benimseyerek sosyal medya aracılığıyla rekabet avantajı elde etmelerini kolaylaştırabilir. Bu çalışmanın amacı, Machado, Martins, Ferreira, Silva ve Duarte (2020) tarafından geliştirilmiş olan Facebook ve Instagram’da spor kulüpleri ile etkileşim kurma motivasyonları ölçeğinin Türkçe uyarlama çalışmasını yapmak ve bu motivasyonların çeşitli değişkenlere göre farklılıklarını incelemektir. Bunun yanı sıra spor kulüplerinin Instagram ve Facebook uygulamalarında, SSY benimsemelerini sağlayacak bir perspektif ele alınmıştır. Araştırmaya Facebook ve Instagram uygulamalarında farklı spor kulüplerini takip eden 552 kişi katılmıştır. Ölçeğin Türkçe versiyonundan elde edilen verilere %27’lik alt ve üst gruplar arası ortalama farkına dayalı madde analizi, doğrulayıcı faktör analizi uygulanmış alt boyutlara ilişkin iç tutarlılık katsayıları hesaplanmıştır. Verilerin analizinde SPSS Statistics 28 ve LİSREL 8.51 paket programları kullanılmıştır. Verilerin analizinde çeşitli değişkenler açısından farklılıkların belirlenmesi amacıyla Mann-Whitney U ve Kruskal Wallis-H analizi kullanılmıştır. Araştırmada kullanılan spor takipçilerinin spor kulüpleri ile Facebook ve Instagram uygulamasında etkileşim kurma motivasyonları ölçeğinin Türkçe uyarlama çalışmasından elde edilen bu bulgular sonucunda, ölçeğin geçerli ve güvenilir bir ölçüm aracı olduğu belirlenmiştir. Araştırmaya katılan spor takipçilerinin cinsiyet, yaş, eğitim durumu, spor türü, Facebook ve Instagram kullanım tercihleri açısından etkileşim kurma motivasyonlarında anlamlı farklılıklar belirlenmiştir. Spor takipçilerinin Facebook ve Instagram uygulamasında ödüllere ilişkin motivasyonlarının benzer şekilde yüksek olduğu, bilgi arama yönündeki motivasyonlarının ise düşük olduğu belirlenmiştir. Anahtar Kelimeler: Bilgi Arama, Ödül, Sosyal medya, Spor Yöneticileri, Stratejik seçim yaklaşımı
This article examines the functioning of the traditional genre of Russian folklore, specifically the religious legend, through the example of retelling the legend of the city of Kitezh on YouTube. The paper highlights the specifics of video content on YouTube, which can be defined as multimodal texts. The legend is presented in two main types: variants of texts recorded earlier and texts conveying personal verbalized mystical experience of communication with the city of Kitezh (folklore legend). The paper identifies three main types of retelling the previously recorded variants of the legend: brief retellings, expanded retellings with reference to “The Kitezh Chronicler,” and expanded retellings with the addition of other historical, quasi-historical, and mythological elements. The article establishes that these multimodal texts can be distributed into three groups depending on the number of resources used: weak multimodality — using only two resources; medium multimodality — using three or more resources; strong multimodality — using more than 4—5 resources. The article shows that the told / retold legend is a fragment of a more complex multimodal text or cycle of texts.
Traditionally, the popularity of classical music composers is approximated through commercial figures like album releases, record sales, or live performances. However, commercial factors only provide one piece of the overall picture. The success of community-driven platforms has profoundly changed how people consume and interact with music, and, consequently, our understanding of what popularity is. People discuss their favourite artists, archive knowledge regarding them and share their work through multimedia platforms. In this paper, we investigate how data from these platforms can provide a more comprehensive view on popularity and engagement regarding the long-tail of classical music composers. We combine album release data provided by MusicBrainz, the commitment of people in maintaining the composers’ Wikipedia pages and user engagement in classical music videos on YouTube. Our analysis provides a complementary multi-faceted view on community engagement and urges future research to expand on user-generated content for a more diverse expression of popularity in the music domain.Keywordssocial mediauser-generated contentmusicpopularityweb crawling
Full-text available
Drawing on the Foucauldian technologies of the self, this study explores how individuals re-envision practices of wellbeing outside of traditional organizational contexts during extreme events. Based on a thematic analysis of 7234 comments posted on the Yoga with Adriene YouTube channel in 2020, this study unpacks a technologically mediated practice of self-care, which we conceptualize as somametamnemata. Our findings illustrate three entangled aspects of somametamnemata relating to yoga, a form of bodywork: Caring about self through practicing yoga online; caring about self and others through sharing about yoga in written comments; and caring about self and others through responding to shared verbalizations of yoga. This study distinguishes somametamnemata from known practices of self-care, advancing existing literature on technologies of self by overcoming the dichotomy between negative views of ill-being and positive views of wellbeing. By situating the potentiality of individual wellbeing within ill-being, we shift debates and discussions of “corporate wellness” beyond organizational boundaries.
Conference Paper
Full-text available
Online social networking sites like Orkut, YouTube, and Flickr are among the most popular sites on the Internet. Users of these sites form a social network, which provides a powerful means of sharing, organizing, and finding content and contacts. The popularity of these sites provides an opportunity to study the characteristics of online social network graphs at large scale. Understanding these graphs is important, both to improve current systems and to design new applications of online social networks. This paper presents a large-scale measurement study and analysis of the structure of multiple online social networks. We examine data gathered from four popular online social networks: Flickr, YouTube, LiveJournal, and Orkut. We crawled the publicly accessible user links on each site, obtaining a large portion of each social network's graph. Our data set contains over 11.3 million users and 328 million links. We believe that this is the first study to examine multiple online social networks at scale. Our results confirm the power-law, small-world, and scale-free properties of online social networks. We observe that the indegree of user nodes tends to match the outdegree; that the networks contain a densely connected core of high-degree nodes; and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network. Finally, we discuss the implications of these structural properties for the design of social network based systems.
Conference Paper
Full-text available
User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new view- ing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world's largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their re- lationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distor- tions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications.
Conference Paper
Full-text available
This paper focuses on the problem of identifying influential users of micro-blogging services. Twitter, one of the most notable micro-blogging services, employs a social-networking model called "following", in which each user can choose who she wants to "follow" to receive tweets from without requiring the latter to give permission first. In a dataset prepared for this study, it is observed that (1) 72.4% of the users in Twitter follow more than 80% of their followers, and (2) 80.5% of the users have 80% of users they are following follow them back. Our study reveals that the presence of "reciprocity" can be explained by phenomenon of homophily. Based on this finding, TwitterRank, an extension of PageRank algorithm, is proposed to measure the influence of users in Twitter. TwitterRank measures the influence taking both the topical similarity between users and the link structure into account. Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank.
Conference Paper
Full-text available
In this paper we investigate the attributes and relative influence of 1.6M Twitter users by tracking 74 million diffusion events that took place on the Twitter follower graph over a two month interval in 2009. Unsurprisingly, we find that the largest cascades tend to be generated by users who have been influential in the past and who have a large number of followers. We also find that URLs that were rated more interesting and/or elicited more positive feelings by workers on Mechanical Turk were more likely to spread. In spite of these intuitive results, however, we find that predictions of which particular user or URL will generate large cascades are relatively unreliable. We conclude, therefore, that word-of-mouth diffusion can only be harnessed reliably by targeting large numbers of potential influencers, thereby capturing average effects. Finally, we consider a family of hypothetical marketing strategies, defined by the relative cost of identifying versus compensating potential "influencers." We find that although under some circumstances, the most influential users are also the most cost-effective, under a wide range of plausible assumptions the most cost-effective performance can be realized using "ordinary influencers"---individuals who exert average or even less-than-average influence.
Conference Paper
Full-text available
Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user's influence on others—a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twit- ter, we present an in-depth comparison of three mea- sures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynam- ics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spon- taneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.
Web 2.0 has brought about several new applications that have enabled arbitrary subsets of users to communicate with each other on a social basis. Such communication increasingly happens not just on Facebook and MySpace but on several smaller network applications such as Twitter and Dodgeball. We present a detailed characterization of Twitter, an application that allows users to send short messages. We gathered three datasets (covering nearly 100,000 users) including constrained crawls of the Twitter network using two different methodologies, and a sampled collection from the publicly available timeline. We identify distinct classes of Twitter users and their behaviors, geographic growth patterns and current size of the network, and compare crawl results obtained under rate limiting constraints.
Similarity breeds connection. This principle - the homophily principle - structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship. The result is that people's personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics. Homophily limits people's social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience. Homophily in race and ethnicity creates the strongest divides in our personal environments, with age, religion, education, occupation, and gender following in roughly that order. Geographic propinquity, families, organizations, and isomorphic positions in social systems all create contexts in which homophilous relations form. Ties between nonsimilar individuals also dissolve at a higher rate, which sets the stage for the formation of niches (localized positions) within social space. We argue for more research on: (a) the basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms; (b) the impact of multiplex ties on the patterns of homophily; and (c) the dynamics of network change over time through which networks and other social entities co-evolve.