Anonymous | Login | Signup for a new account | 2021-01-21 16:33 CET | ![]() |
Main | My View | View Issues | Change Log | Roadmap |
View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||||||
0000599 | YaCy | Wishlist - Wunschliste | public | 2015-09-02 18:03 | 2016-03-24 13:38 | ||||||||
Reporter | Davide | ||||||||||||
Assigned To | |||||||||||||
Priority | normal | Severity | tweak | Reproducibility | N/A | ||||||||
Status | new | Resolution | open | ||||||||||
ETA | none | ||||||||||||
Platform | OS | OS Version | |||||||||||
Product Version | YaCy 1.8 | ||||||||||||
Target Version | Fixed in Version | ||||||||||||
Summary | 0000599: Customizable UA string | ||||||||||||
Description | Allow to customize the UA string from a config file. This would expand the intrinsic affordance of YaCy, and would allow to grow the amount of websites in the shared index, because the substring "bot" contained in the default "Yacybot" is blocked by many websites. | ||||||||||||
Tags | No tags attached. | ||||||||||||
Attached Files | |||||||||||||
![]() |
|
(0001120) Davide (reporter) 2015-10-23 14:12 |
For the record: 1) I found that Yacy allows to customize the UA string (DATA/SETTINGS/yacy.conf) but the string must not contain any lowercase("yacy") substring. That's why my customization silently failed to apply. 2) Amazon, right now, allows UA strings to contain "wget" and "bot". "bot" is part of "yacybot", the default UA string. |
(0001228) b0b3r (reporter) 2016-03-17 13:26 |
For me it don't work at all. For example: grep "userAgent" yacy/DATA/SETTINGS/yacy.conf crawler.userAgent.string=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html [^]) crawler.userAgent.clienttimeout=10000 crawler.userAgent.name=Mozilla crawler.userAgent.minimumdelta=500 But in output from tcpdump: ... User-Agent: yacybot (/global; amd64 Linux 4.1.19-gentoo; java 1.7.0_95; Europe/pl) http://yacy.net/bot.html [^] ... In my case problem is that some sites are blocking everything with "java" string. Besides, I think there is no need to reveal so many details in this string by default. Something similar to this which Google gives would be sufficient. |
![]() |
|||
Date Modified | Username | Field | Change |
2015-09-02 18:03 | Davide | New Issue | |
2015-10-23 14:12 | Davide | Note Added: 0001120 | |
2016-03-17 13:26 | b0b3r | Note Added: 0001228 |
Copyright © 2000 - 2021 MantisBT Team |