# Robots.txt for www.rash-id.com # Last updated: January 2025 # =========================================== # TRADITIONAL SEARCH ENGINES - ALLOW # =========================================== User-agent: Googlebot User-agent: Bingbot User-agent: Slurp User-agent: DuckDuckBot User-agent: Baiduspider User-agent: YandexBot User-agent: facebookexternalhit User-agent: Twitterbot User-agent: LinkedInBot User-agent: WhatsApp User-agent: Applebot Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 0 # =========================================== # TRUSTED AI COMPANIES - ALLOW FOR TRAINING # =========================================== # OpenAI User-agent: GPTBot User-agent: ChatGPT-User Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # Anthropic User-agent: anthropic-ai User-agent: Claude-Web Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # Google AI User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # Meta/Facebook AI User-agent: FacebookBot Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # Apple AI User-agent: Applebot-Extended Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # Cohere User-agent: cohere-ai Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 1 # =========================================== # AI SEARCH/ANSWER ENGINES - ALLOW WITH LIMITS # =========================================== User-agent: PerplexityBot User-agent: YouBot Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 2 # =========================================== # UNTRUSTED AI/DATA CRAWLERS - BLOCK # =========================================== User-agent: CCBot User-agent: Bytespider User-agent: Diffbot User-agent: Amazonbot User-agent: AwarioRssBot User-agent: AwarioSmartBot User-agent: DataForSeoBot User-agent: magpie-crawler User-agent: NewsNow User-agent: news-please User-agent: peer39_crawler User-agent: peer39_crawler/1.0 User-agent: omgili User-agent: omgilibot User-agent: ICC-Crawler User-agent: img2dataset User-agent: Scrapy Disallow: / # =========================================== # SEO & ANALYTICS TOOLS - ALLOW # =========================================== User-agent: AhrefsBot User-agent: SemrushBot User-agent: MJ12bot User-agent: DotBot User-agent: Screaming Frog SEO Spider Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 2 # =========================================== # AGGRESSIVE/MALICIOUS BOTS - BLOCK # =========================================== User-agent: SentiBot User-agent: BLEXBot User-agent: BUbiNG User-agent: Buck User-agent: ltx71 User-agent: Mb2345Browser User-agent: MegaIndex User-agent: Mediatoolkitbot User-agent: MJ12bot User-agent: NPBot User-agent: Nutch User-agent: PiplBot User-agent: python-requests User-agent: Python-urllib User-agent: python-urllib3 User-agent: R6_BOT User-agent: RandomBot User-agent: RepoMonkey User-agent: Riddler User-agent: rogerbot User-agent: SEMrushBot User-agent: SeznamBot User-agent: sitebot User-agent: TurnitinBot User-agent: Vagabondo User-agent: VoilaBot User-agent: WBSearchBot User-agent: WWW-Mechanize User-agent: Xenu Disallow: / # =========================================== # ARCHIVING BOTS - YOUR CHOICE # =========================================== User-agent: ia_archiver User-agent: Wayback Machine Allow: / Disallow: /admin/ Disallow: /api/ Crawl-delay: 5 # =========================================== # DEFAULT FOR ALL OTHERS # =========================================== User-agent: * Allow: / Disallow: /admin/ Disallow: /api/ Disallow: /private/ Disallow: /*.json$ Disallow: /*?* Disallow: /search Crawl-delay: 1 # =========================================== # SITEMAP # =========================================== Sitemap: https://www.rash-id.com/sitemap.xml