YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000561YaCy[All Projects] Generalpublic2015-03-15 22:272015-03-18 17:50
Reportervikulin 
Assigned To 
PrioritynormalSeveritymajorReproducibilityhave not tried
StatusnewResolutionopen 
ETAnone 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000561: robots.txt is ignored by crawler in intranet mode and IPv6-CJDNS sites
Descriptionrobots.txt is ignored by crawler in intranet mode and IPv6 sites. This issue was observed once and I didn't try to reproduce it again.
Steps To ReproduceProbably the issue has to be reproduced by the following steps:
1. Switch to intranet mode
2. Start crawling for sites with robots.txt as described below
User-agent: *

Allow: /snapshots

Disallow: /openwrt/waterfall
Disallow: /openwrt/builders
Disallow: /openwrt/changes
Disallow: /openwrt/buildslaves
Disallow: /openwrt/schedulers
Disallow: /openwrt/one_line_per_build
Disallow: /openwrt/builders
Disallow: /openwrt/grid
Disallow: /openwrt/tgrid
Disallow: /openwrt/json

Disallow: /cjdns/waterfall
Disallow: /cjdns/builders
Disallow: /cjdns/changes
Disallow: /cjdns/buildslaves
Disallow: /cjdns/schedulers
Disallow: /cjdns/one_line_per_build
Disallow: /cjdns/builders
Disallow: /cjdns/grid
Disallow: /cjdns/tgrid
Disallow: /cjdns/json

Actual result:
1. rebots.txt was not even loaded. All site content was scanned.
Additional Informationerver log where the issue was found.

fcd9:8810:bb91:fd19:ddae:5c59:6df5:949e - - [15/Mar/2015:20:35:24 +0100] "GET /cjdns/builders/Mac%20OS%20X%20(Non%20Apple%20Hardware)/builds/56/steps/compile/logs/stdio/text HTTP/1.1" 200 66377 "https://[fcfc:5d70:99b6:4e0c:cba6:61ec:434b:df1a]/cjdns/builders/Mac%20OS%20X%20(Non%20Apple%20Hardware)/builds/56/steps/compile/logs/stdio" [^] "yacybot (/global; amd64 Linux 3.13.0-45-generic; java 1.7.0_76; Europe/en)
Tagscrawler, robots.txt
Attached Files

- Relationships

-  Notes
(0001023)
vikulin (reporter)
2015-03-18 17:50

Found following messages in yacy log:

2015/03/18 18:44:56 net.yacy.crawler.robots.RobotsTxt new entry in robots.txt table failed, resetting database
I 2015/03/18 18:44:56 Heap clearing heap /media/vadym/9ba4f617-ef6b-410a-b92b-77d2b9e82573/yacy/DATA/WORK/robots.bheap
E 2015/03/18 18:44:56 net.yacy.crawler.robots.RobotsTxt new entry in robots.txt table failed, resetting database
I 2015/03/18 18:44:56 Heap clearing heap /media/vadym/9ba4f617-ef6b-410a-b92b-77d2b9e82573/yacy/DATA/WORK/robots.bheap
E 2015/03/18 18:44:56 net.yacy.crawler.robots.RobotsTxt new entry in robots.txt table failed, resetting database
I 2015/03/18 18:44:56 Heap clearing heap /media/vadym/9ba4f617-ef6b-410a-b92b-77d2b9e82573/yacy/DATA/WORK/robots.bheap

- Issue History
Date Modified Username Field Change
2015-03-15 22:27 vikulin New Issue
2015-03-15 22:28 vikulin Tag Attached: crawler
2015-03-15 22:28 vikulin Tag Attached: robots.txt
2015-03-18 17:50 vikulin Note Added: 0001023


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker