YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000610YaCy[All Projects] Generalpublic2015-10-23 13:402016-03-09 13:25
ReporterDavide 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusnewResolutionopen 
ETAnone 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000610: "maximumRecords" returns a random subset of results
DescriptionThe following query

    query 0000001
    /yacysearch.json?query=logitech+mx+revolution&resource=local&maximumRecords=100

returns relevant results.

Running the same query with a reduced maximumRecords parameter from 100 to 10 (query 0000002) doesn't return the top 10 results with lowest ranking score, as I would expect, instead it apparently returns a random subset of 10 unsorted results with casual rankings, and does not include the most relevant (lowest ranking score) results returned by query 0000001.

    query 0000002
    /yacysearch.json?query=logitech+mx+revolution&resource=local&maximumRecords=10
Additional InformationAll the results with low ranking score which are returned by 0000001 are in my local index, and thus can be retrieved repeatedly and are always accessible to my queries.

Let me know if you need more info or if the bug report is invalid.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0001119)
Davide (reporter)
2015-10-23 13:42

Syntax notice:

0000001 stands for "query number 1"
0000002 stands for "query number 2"
(0001139)
luc (reporter)
2015-11-28 19:51

As a first analysis I ran your request in debug mode. I placed conditional breakpoints on URIMetadataNode class, to stop when score proprerty (later assigned to ranking property in JSON) is set with a non zero value. I am not very experienced with YaCy sources, but in this case it seems results are a mix of two Solr queries, and thus sorting should be done again before writing final result.
It needs deeper analysis to confirm that, but if it can help you...
(0001141)
Davide (reporter)
2015-11-29 13:04
edited on: 2016-03-09 13:25

Thank you luc for your recent works on Yacy.
It'll be interesting to have a high-level understanding of how results are sorted and presented to the API. I have a fear that this part of the code is broken and the API code is severely neglected.

(0001142)
BuBu (developer)
2015-11-29 17:31

As you found out, with ranking is a issue, as internally 2 rankings are used, the DHT and the Solr ranking (not really compatible) and mixed in the results.
(0001143)
Davide (reporter)
2015-11-29 17:46

Does this defect also affect the results returned via "yacysearch.html", or it only affects the API?

Furthermore, if the defect also affects "yacysearch.html", then the whole ranking algorithm of Yacy would be flawed; if this is the case, then a honest question:
what would be the whole point of keeping using and developing Yacy, if after a decade of development it still suffers fundamental design issues (i.e. the ranking system is broken at the core)?
(0001144)
luc (reporter)
2015-11-30 00:21

Even if ranking system is broken in yacysearch.json or even in yacysearch.html, I see at least some reasons to keep using and developing :
 - philosophy and ideas behind the whole project are still very good
 - existing code base is far from perfect but is certainly the result of a big amount of volunteers work and already give access to a large shared index dataset.
 - this kind of issue is not impossible to overcome. You reported it : it is one more step to solve it and improve the whole system, step after step...
(0001146)
luc (reporter)
2015-12-05 13:18
edited on: 2015-12-05 13:19

Correct me if I am wrong, but I think ranking compatibility between DHT and Solr is only one part of the problem : even if rankings were compatible it seems items would not be sorted correctly.
Indeed, in yacysearch, result items are gathered concurrently from differents sources, and they are written to output once they are available (see yacysearchitem.java : "final URIMetadataNode result = theSearch.oneResult(item, timeout);").
Finally, re-ranking and sorting of global results list from different sources can not be effective as they are concurrently pushed and pulled one by one.

To solve this issue, shouldn't we consider using SearchEvent.completeResults() method to gather consistent results in yacysearch servlet?

(0001180)
Davide (reporter)
2015-12-18 00:22

Thank you luc; as you are the only one actively revising the source code at this moment, I'll ask you some turn-point questions, as I am about to decide whether I'll unplug my 700GB Yacy index for a better future, when the ranking code will eventually serve its intended purpose, or to sell the hardware and recover cost.

Do you believe the ranking system offered thru the search API will become functional soon?
Do you believe image search API is accurate at all, and images are appropriately contextualized when indexed from their webpage? If not, do you think the image search API will eventually become usable, as IMHO it currently isn't?
(0001181)
luc (reporter)
2015-12-18 02:01

Thank you for your confidence Davide. Difficult questions you are asking, and I am probably not the most appropriate one to answer.
In fact, Reger24 seems to be the person who has been fixing the most bugs last months and has a far stronger background on YaCy.
Did you noticed he made something about ranking : https://github.com/yacy/yacy_search_server/commit/cdb8f3b10d15edc75f68c8170a967289f7c40981. [^]

I had not enough time to check and test if it solves at least partially what you described here. Maybe you could personnaly ask him if he plans to work on the issues who bother you.
Testing takes some time to be truly accurate, especially when running in P2P mode. But I have personnaly good hope it can be solved soon.

Current results in documents or images search also don't fully satisfy me currently. But if we want it to be improved, I believe the job has to be done by ourselves.
I am only gaining some humble skills on YaCy core code which heavily rely on Solr and concurrency. Not so easy to handle... YaCy search system is quite challenging and I think improving results relevance is a big task. It's up to you to decide if you can wait or rather contribute as far as you can.

By the way, I will continue trying to enhance what I can, once I will have some time... after christmas and new year holidays.

- Issue History
Date Modified Username Field Change
2015-10-23 13:40 Davide New Issue
2015-10-23 13:42 Davide Note Added: 0001119
2015-11-28 00:47 Davide Note Added: 0001138
2015-11-28 19:51 luc Note Added: 0001139
2015-11-29 13:04 Davide Note Added: 0001141
2015-11-29 17:31 BuBu Note Added: 0001142
2015-11-29 17:46 Davide Note Added: 0001143
2015-11-30 00:21 luc Note Added: 0001144
2015-12-05 13:18 luc Note Added: 0001146
2015-12-05 13:19 luc Note Edited: 0001146 View Revisions
2015-12-18 00:22 Davide Note Added: 0001180
2015-12-18 02:01 luc Note Added: 0001181
2016-03-09 13:23 Davide Note Deleted: 0001138
2016-03-09 13:25 Davide Note Edited: 0001141 View Revisions


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker