Implementing new regulatory frameworks, alongside the growth of online web services, forces an endless evolution of current techniques to study and audit online web services. Furthermore, there is a need to emphasize the online advertising ecosystem, as it represents the primary economic support of a high percentage of web services. Also, the activities and abuses conducted by this ecosystem drove the implementation of current privacy regulations to control the use and collection of personal data. This dissertation falls within the topics of Internet measurements, tackling the need for new measurement techniques and methodological approaches to audit and study online web services. These efforts want to increase the limited knowledge about web subsystems offering sensitive material, including their regulatory compliance regarding current privacy regulations. Also, this dissertation tackles the need to study and measure how big ad tech companies create and use the online profiles of their users to distribute tailored ads. Furthermore, the work presented in this dissertation raises the need for a more in-depth understanding of fundamental tools for conducting Internet measurement works, including their limitations and suitability for academic research. Specifically, this dissertation presents three main contributions: The first one corresponds with implementing a novel methodology to audit sensitive web services’ privacy, transparency, and regulatory compliance. We validate our method by looking at pornographic websites concerning the GDPR in the European Union. We focus our analysis on such types of websites for two main reasons: (i) the GDPR establishes specific provisions and strict requirements on sensitive websites, including pornographic ones. (ii) big ad tech companies set strict constraints for porn-related publishers. As a result, it opened new market opportunities for other actors who have specialized in advertising and tracking technologies for adult sites, creating a semi-decoupled ecosystem from the rest of the web. We perform a holistic analysis of over 6,843 pornographic websites, finding a prevalent absence of regulatory compliance and very extended use of tracking techniques, including advanced ones such as fingerprinting. These results stress the importance of studying the World Wide Web subsets that have not been scrutinized by regulators, policymakers, and the research community in depth. Second, we empirically and comprehensively analyze 13 domain classification services to study their labeling strategy and performance. These services have multiple applications, from business applications such as online advertising to academic research works to conduct category-dependent measurements or to identify the purpose of a website or online service. We study each domain classification service’s methodologies, scalability limitations, label constellations, and suitability for academic research studies. In some cases, their findings depend on the results provided by the domain classification services. We find that the limitations and shortcomings of each domain classification service heavily affect their suitability and applicability, both for practical solutions and academic studies. In the third and last contribution, we implement a novel methodology with real users to study the performance and quality of the profiling and ad targeting algorithms from the two most important stakeholders in the online advertising business, Google and Meta (previously Facebook). We find that half of the categories associated with the profiles are incorrectly assigned. We also observe the presence of sensitive categories in Facebook users, posing a privacy risk and potential regulatory noncompliance. In summary, this dissertation brings new methodologies and results to increase our limited knowledge about the web.
Keywords
Online Web Services AuditPrivacy Regulations ComplianceAd Targeting Algorithms
Institute(s)
Universidad Carlos III de Madrid
Year
2022
Abstract
Author(s)
Pelayo Vallina RodrĂguez