Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Three times more fatal! Thanks to the SIC, China’s J-20 stealth fighters can now detect enemy jets at distances such as F-35, F-22, and more.

Chinese researchers release the world’s first fully automated AI-based processor chip design system

Today’s Earthquake: Massive Trembling 3.8 Jorz Afghanistan

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » AI web spiders: Companies warn of emergence of AI web spiders
Tech

AI web spiders: Companies warn of emergence of AI web spiders

Adnan MaharBy Adnan MaharDecember 15, 2024No Comments4 Mins Read2 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Industry executives and experts say businesses are increasingly relying on blocking artificial intelligence (AI) web crawlers and spiders that scrape the web piecemeal and hinder website performance.

An AI crawler is a computer program that collects data from websites to train large language models. Due to the increased use of AI search and the need to collect training data, a number of new web scrapers have appeared on the internet, including Bytespider, PerplexityBot, ClaudeBot, and GPTBot.

Until 2022, the internet had traditional search engine crawlers such as GoogleBot, AppleBot, and BingBot, which had followed decades-old principles of ethical content scraping and scheduling.

Screenshots (42)Etech

On the other hand, aggressive AI bots not only violate content guidelines, but also slow down website performance, add overhead, and pose security threats. Many websites and content portals have implemented anti-scraping and bot restriction technologies to combat this. According to Cloudflare, a leading content delivery network provider, nearly 40% of the top 10 internet domains visited by 80% of AI bots have moved to block AI crawlers.

Find a story that interests you

Nascom, India’s top technology agency, said these crawlers are particularly damaging when news publishers use content created without attribution. “If using copyrighted data to train an AI model falls under fair use, then that use is invalid,” Raj Shekhar, head of responsible AI at Nasscom, told ET. “The legal dispute between ANI Media and OpenAI is a wake-up call for AI developers to be aware of IP (intellectual property) laws when collecting training data. Intellectual property experts should be consulted to ensure compliant data practices and avoid potential liability.”

“Scraping introduces significant overhead and impacts website performance. It interacts with the site intensively and This is done by trying to collect every piece of content, which slows down performance.

According to Cloudflare’s analysis of the top 10,000 internet domains, three AI bots accounted for the highest share of visited websites: Bytespider (40.40%), operated by China’s TikTok, and GPTBot (40.40%), operated by OpenAI. 35.46%) and ClaudeBot (11.17%) operated by Anthropic. Although these AI bots follow the rules, Cloudflare says the overwhelming majority of its customers choose to block them. Meanwhile, CCBot, developed by Common Crawl, scrapes the web to create open-source datasets that anyone can use.

Features of AI crawler

Unlike traditional crawlers, AI crawlers target high-quality text, images, and videos that can enrich your training dataset. AI-powered crawlers are more intelligent than traditional search engine crawlers that “just crawl, collect data, and stop there,” Akamai’s Koh said. “Their intelligence is used not only to select data, but also to classify and prioritize data. This is because even after crawling, indexing, and scraping all the data, you still have to decide what the data will be used for. “It means we can handle it,” he said.

Traditionally, web scraper bots follow the robots.txt protocol for guidance on what can be indexed. Traditional search engine bots like GoogleBot and BingBot comply with this and stay away from intellectual property. However, AI bots have been found to violate robots.txt principles in multiple instances. “Google and Bing follow predictable and transparent indexing schedules, so your website won’t be overwhelmed. For example, Google makes it clear how often a particular domain should be indexed. Companies can anticipate and manage potential performance impacts,” said Koh. “As newer, more aggressive crawlers emerge, such as those driven by AI, the situation becomes less predictable. These crawlers do not necessarily operate on a fixed schedule, and their scraping activity becomes more intensive. It may be done.”

Koh warned of a third category of crawlers that are malicious in nature and use data to commit fraud. According to Akamai’s State of The Internet study, more than 40% of all internet traffic comes from bots, and about 65% of that comes from malicious bots.

cannot block everything

However, experts say eliminating AI crawlers is not the ultimate solution as they still need to discover your website. If AI search is to become the new way of searching, websites need to appear in commercial search engine results to be discovered and attract customers, they said. “Enterprises will be concerned about whether we are blocking crawling and bot activity that generates legitimate revenue, or whether we are allowing too much malicious activity to occur on our websites. Is it too much? It’s a very delicate balance and they need to understand it,” Mr Koh said.



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleWest African highway linking Nigeria and Ivory Coast could help revitalize Ecowas
Next Article Saturday’s biggest gainers included Dogeson, Shironeko and Orbit.
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Chinese researchers release the world’s first fully automated AI-based processor chip design system

June 13, 2025

Qualcomm’s Snapdragon Chips gets into trouble after a judge refuses to dismiss the case

May 30, 2025

Amazon will face Elon Musk’s Tesla with the robot launch.

May 7, 2025
Leave A Reply Cancel Reply

Top Posts

20 Most Anticipated Sex Movies of 2025

January 22, 2025110 Views

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 2024102 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202458 Views

How to tell the difference between fake and genuine Adidas Sambas

December 26, 202437 Views
Don't Miss
AI June 1, 2025

Dig into Google Deepmind CEO “Shout Out” Chip Engineers and Openai CEO Sam Altman, Sundar Pichai responds with emojis

Demis Hassabis, CEO of Google Deepmind, has expanded public approval to its chip engineers, highlighting…

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Three times more fatal! Thanks to the SIC, China’s J-20 stealth fighters can now detect enemy jets at distances such as F-35, F-22, and more.

Chinese researchers release the world’s first fully automated AI-based processor chip design system

Today’s Earthquake: Massive Trembling 3.8 Jorz Afghanistan

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.