Figure - uploaded by Elisa Chiapponi
Content may be subject to copyright.
IPQualityScore classification of the IPs (*From the total number of positive matches, 10213, we subtracted the number of positive values of VPN)

IPQualityScore classification of the IPs (*From the total number of positive matches, 10213, we subtracted the number of positive values of VPN)

Source publication
Chapter
Full-text available
This paper proposes a method and empirical pieces of evidence to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Using a real-world setup, we have had access to the logs of close to 20 heavily targeted websites and have carried out an experiment over a two months...

Similar publications

Article
Full-text available
This article deals with quality of service (QoS) in internet protocol (IP) telephony by applying software-defined networking (SDN) tools. The authors develop a new design that deterministically classifies real-time protocol (RTP) streams based on data found in session initiation protocol (SIP) using SIP proxy as a mediator, and the concept making t...

Citations

... The IP addresses collected during our second case were studied to check if the bots were taking advantage of proxy services, as discussed in our previous work (Chiapponi, Dacier, Todisco, Catakoglu, & Thonnard, 2020). ...
Article
Full-text available
Airline websites are the victims of unauthorised online travel agencies and aggregators that use armies of bots to scrape prices and flight information. These so-called Advanced Persistent Bots (APBs) are highly sophisticated. On top of the valuable information taken away, these huge quantities of requests consume a very substantial amount of resources on the airlines' websites. In this work, we propose a deceptive approach to counter scraping bots. We present a platform capable of mimicking airlines' sites changing prices at will. We provide results on the case studies we performed with it. We have lured bots for almost 2 months, fed them with indistinguishable inaccurate information. Studying the collected requests, we have found behavioural patterns that could be used as complementary bot detection. Moreover, based on the gathered empirical pieces of evidence, we propose a method to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Our mathematical models indicate that the amount of IPs is likely 2 to 3 orders of magnitude smaller than the one claimed. This finding suggests that an IP reputation-based blocking strategy could be effective, contrary to what operators of these websites think today.