YaCy-Bugtracker - YaCy
View Issue Details
0000692YaCy[All Projects] Generalpublic2016-09-25 11:182016-09-27 19:46
luc 
BuBu 
normalminoralways
resolvedfixed 
none 
Microsoft Windows10
 
 
0000692: Intranet mode : duplicates MS Windows file URLs
In intranet mode, various Microsoft Windows file scheme URL variants are not detected as duplicates. Examples :
 - file://V:/Test/image.jpg [^]
 - file:///V:/Test/image.jpg [^]
 - file:///V:Test/image.jpg [^]
- Run YaCy in Intranet mode on a Microsoft Windows OS
- Start a new crawl (/CrawlStartSite.html) with a starting point such as : "file://V:\Test" [^]
- Documents in this folder are indexed
- Start new crawls starting at the same URL but written differently, such as : "file:///V:/Test" [^] or "file:///V:Test" [^]
- Documents are re-indexed but not detected as already in the index
- Search something in the indexed documents : it produces duplicated results with the various URLs flavours
No tags attached.
Issue History
2016-09-25 11:18lucNew Issue
2016-09-25 22:10BuBuNote Added: 0001306
2016-09-27 08:05lucNote Added: 0001307
2016-09-27 19:46BuBuStatusnew => resolved
2016-09-27 19:46BuBuResolutionopen => fixed
2016-09-27 19:46BuBuAssigned To => BuBu

Notes
(0001306)
BuBu   
2016-09-25 22:10   
Not solving the main case but improving a little bit on mixed notation
(c:\tmp\test.txt vs. c:\tmp/test.txt )
with this commit
https://github.com/yacy/yacy_search_server/commit/6f8c3ccea4cc70368c2f4dda989e27365eb4e860 [^]
(0001307)
luc   
2016-09-27 08:05   
Thank you BuBu.
And with this complementary commit (https://github.com/yacy/yacy_search_server/commit/1bb0b135ac5dab0adab423d89612f7b1e13f2e61 [^]) the described use cases are fixed.
Tested on MS Windows 10.
Non regresionn testing on Debian Jessie.