View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000599YaCyWishlist - Wunschlistepublic2015-09-02 18:032016-03-24 13:38
Assigned To 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000599: Customizable UA string
DescriptionAllow to customize the UA string from a config file.

This would expand the intrinsic affordance of YaCy, and would allow to grow the amount of websites in the shared index, because the substring "bot" contained in the default "Yacybot" is blocked by many websites.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
Davide (reporter)
2015-10-23 14:12

For the record:

1) I found that Yacy allows to customize the UA string (DATA/SETTINGS/yacy.conf) but the string must not contain any lowercase("yacy") substring. That's why my customization silently failed to apply.

2) Amazon, right now, allows UA strings to contain "wget" and "bot". "bot" is part of "yacybot", the default UA string.
b0b3r (reporter)
2016-03-17 13:26

For me it don't work at all. For example:
grep "userAgent" yacy/DATA/SETTINGS/yacy.conf
crawler.userAgent.string=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html [^])

But in output from tcpdump:
User-Agent: yacybot (/global; amd64 Linux 4.1.19-gentoo; java 1.7.0_95; Europe/pl) http://yacy.net/bot.html [^]

In my case problem is that some sites are blocking everything with "java" string. Besides, I think there is no need to reveal so many details in this string by default. Something similar to this which Google gives would be sufficient.

- Issue History
Date Modified Username Field Change
2015-09-02 18:03 Davide New Issue
2015-10-23 14:12 Davide Note Added: 0001120
2016-03-17 13:26 b0b3r Note Added: 0001228

Copyright © 2000 - 2021 MantisBT Team
Powered by Mantis Bugtracker