Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the acf domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /opt/bitnami/wordpress/wp-includes/functions.php on line 6131

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the filebird domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /opt/bitnami/wordpress/wp-includes/functions.php on line 6131

Notice: Function acf_get_value was called incorrectly. Advanced Custom Fields - We've detected one or more calls to retrieve ACF field values before ACF has been initialized. This is not supported and can result in malformed or missing data. Learn how to fix this. Please see Debugging in WordPress for more information. (This message was added in version 5.11.1.) in /opt/bitnami/wordpress/wp-includes/functions.php on line 6131

Deprecated: preg_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated in /opt/bitnami/wordpress/wp-includes/kses.php on line 2018

Adapting to AI: Approaches for Digital Publishers in Managing Web Scraping

INFO Staff - January 2, 2024

A report of key insights and practical advice for digital publishers

Photo of a person posting a blog online

In a world where content is created and shared at an unprecedented scale, data scraping for AI models poses a significant threat to the sanctity of intellectual property and the economic foundation of digital publishing. The unauthorized extraction of data not only jeopardizes the exclusivity of the content but also impacts search engine rankings of publishers’ websites, leading to a potential decrease in advertising revenue.

Graduate students at the University of Maryland College of Information Studies (INFO) provide an in-depth analysis of the current state of web scraping activities, their impact on digital publishing, and strategic responses available to publishers: “Adapting to AI: Approaches for Digital Publishers in Managing Web Scraping.” Targeted at digital publishers, legal experts, technology firms, and policy makers, the report provides key insights and practical advice to effectively navigate the complexities of web scraping in the AI era, ensuring the ongoing growth and resilience of the digital publishing sector.

The report examines key legal cases, such as LinkedIn vs. hiQ Labs and Associated Press vs. Meltwater, to understand the legal responses to content scraping. It also discusses the ethical dimensions of using scraped data for AI training, focusing on the balance between public data access and copyright laws. Addressing the limitations of conventional anti-scraping methods, the report recommends a collaborative strategy to create an environment that values innovation, creativity, and respect for intellectual property rights.

Authors:

Pooja Pandey, M.S. in Human Computer Interaction
Hyejin Jo, M.S. in Human Computer Interaction
Angela Tseng, Master of Information Management

Full Report