YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000526YaCy[All Projects] Generalpublic2015-01-20 19:192015-03-21 15:35
Reporterdrixter 
Assigned ToLow012 
PrioritynormalSeverityminorReproducibilityalways
StatusassignedResolutionopen 
ETAnone 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000526: Loader and file size limits don't work
DescriptionBased on http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5477 [^]

==cut==
I've just noticed that after starting crawling there are large files being downloaded (these can be observed under "Crawler Monitor" -> "Processing Monitor"/"Loader") which seem to exceed file size limits. I've checked limits under "System Administration" -> "Advanced Settings" -> "Crawler Settings" and these are:
HTTP Crawler Settings: 10485760
FTP Crawler Settings: 10485760
SMB Crawler Settings: 100000000
Local File Crawler Settings: 100000000

For example, I've observed following file being downloaded: http://www.swi-prolog.org/download/stab [^] ... d.mpkg.zip
HTTP response for this file returns Content-Length:"12026535" which is bigger than Yacy HTTP file size limit (10485760).
I have also seen 400Mb+ files being downloaded.

==cut==
Steps To ReproduceCraw the site with bigger than allowed files.

I also observe this issue, which make a nonsense traffic.

0.0.0.0 - - [15/Jan/2015:12:54:06 +0100] "GET /debian/pool/main/libr/libreoffice/libreoffice_3.5.4%2Bdfsg2.orig-src.tar.gz
HTTP/1.1" 200 306886045
"http://debian.mirror.martin89.dn42/debian/pool/main/libr/libreoffice/" [^]
"yacybot (-global; amd64 Linux 3.11.0-1.el6.elrepo.x86_64; java 1.7.0_25; Europe/en) http://yacy.net/bot.html" [^] "-" "-" "5098.454" "-" "."

0.0.0.0 - - [15/Jan/2015:14:40:36 +0100] "GET /debian/pool/main/n/nexuiz-data/nexuiz-data_2.5.2.orig.tar.gz HTTP/1.1"
200 378203595
"http://debian.mirror.martin89.dn42/debian/pool/main/n/nexuiz-data/" [^]
"yacybot (-global; amd64 Linux 3.11.0-1.el6.elrepo.x86_64; java 1.7.0_25; Europe/en) http://yacy.net/bot.html" [^] "-" "-" "5637.803" "-" "."

The last one has a size of 378203595 bytes ~ 364 MB.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
2015-01-20 19:19 drixter New Issue
2015-03-21 15:35 Low012 Assigned To => Low012
2015-03-21 15:35 Low012 Status new => assigned


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker