View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] |
ID | Project | Category | View Status | Date Submitted | Last Update |
0000730 | YaCy | Wishlist - Wunschliste | public | 2017-03-28 19:34 | 2017-03-28 19:34 |
|
Reporter | smokingwheels | |
Assigned To | | |
Priority | normal | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | |
ETA | none | |
Platform | X86 | OS | Linux Debian +Ubuntu | OS Version | |
Product Version | YaCy 1.9 | |
Target Version | | Fixed in Version | | |
|
Summary | 0000730: Some web sites have URL's with ;amp;amp. hundreds of them. |
Description | When crawling some sites, I have noticed a few sites having a URL's that suffice/fix with www.domain.com/somepage.html;amp;amp;amp so on for at least 3 to 4 lines in the crawler monitor.
If I find then now I terminate the crawl or blacklist the site because it just slows down my slow PC.
Maybe an option to bypass the sites if one wishes?
|
Tags | No tags attached. |
|
Attached Files | |
|