YaCy-Bugtracker - YaCy
View Issue Details
0000599YaCyWishlist - Wunschlistepublic2015-09-02 18:032016-03-24 13:38
Davide 
 
normaltweakN/A
newopen 
none 
YaCy 1.8 
 
0000599: Customizable UA string
Allow to customize the UA string from a config file.

This would expand the intrinsic affordance of YaCy, and would allow to grow the amount of websites in the shared index, because the substring "bot" contained in the default "Yacybot" is blocked by many websites.
No tags attached.
Issue History
2015-09-02 18:03DavideNew Issue
2015-10-23 14:12DavideNote Added: 0001120
2016-03-17 13:26b0b3rNote Added: 0001228

Notes
(0001120)
Davide   
2015-10-23 14:12   
For the record:

1) I found that Yacy allows to customize the UA string (DATA/SETTINGS/yacy.conf) but the string must not contain any lowercase("yacy") substring. That's why my customization silently failed to apply.

2) Amazon, right now, allows UA strings to contain "wget" and "bot". "bot" is part of "yacybot", the default UA string.
(0001228)
b0b3r   
2016-03-17 13:26   
For me it don't work at all. For example:
grep "userAgent" yacy/DATA/SETTINGS/yacy.conf
crawler.userAgent.string=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html [^])
crawler.userAgent.clienttimeout=10000
crawler.userAgent.name=Mozilla
crawler.userAgent.minimumdelta=500

But in output from tcpdump:
...
User-Agent: yacybot (/global; amd64 Linux 4.1.19-gentoo; java 1.7.0_95; Europe/pl) http://yacy.net/bot.html [^]
...

In my case problem is that some sites are blocking everything with "java" string. Besides, I think there is no need to reveal so many details in this string by default. Something similar to this which Google gives would be sufficient.