View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000738YaCy[All Projects] Generalpublic2017-04-22 02:382019-07-28 08:38
Assigned To 
PlatformOSOS Version
Product VersionYaCy 1.9 
Target VersionFixed in Version 
Summary0000738: YaCy stuck in noindex,follow wasteland
DescriptionI've seen this happening many times: crawler is about to crawl a set of forums or search results or archive pages (the kind of pages with little or duplicate content which are usually not to be indexed by search engines). SEOs commonly meta-tag such pages as "noindex,follow" to let pagerank flow through the site, but the same time avoid such pages spam the Google index.

YaCy takes their instruction (noindex,follow) as is, but obviously has no way to deal with it properly. Thus YaCy will (in certain situations) crawl them for hours without indexing.
Additional InformationSame issue:
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5061&p=29327#p29319 [^]

In my case it's a scheduled job. It's limited to 100 pages per seed URL, but that limit doesn't seem to work over scheduled jobs.

Solution would be to disable "noindex,follow" crawling at all. It makes little sense for YaCy anyway. A valid assumption is that important content is always linked from indexable pages and not hidden.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
2017-04-22 02:38 shni New Issue

Copyright © 2000 - 2021 MantisBT Team
Powered by Mantis Bugtracker