|Anonymous | Login | Signup for a new account||2018-12-14 06:41 CET|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000629||YaCy||[All Projects] General||public||2015-12-16 22:59||2016-11-03 21:32|
|Platform||OS||Debian GNU/Liux||OS Version||8.2 Jessie|
|Product Version||YaCy 1.8|
|Target Version||Fixed in Version|
|Summary||0000629: Default favicon used despite one is available|
|Description||In search results, default favicon (http://localhost:8090/env/grafics/dfltfvcn.ico [^]) is often used, even if a favicon is effectively available for the result document. |
One may think it is not really important, but it is slowing down the use of results page : even if all result rows are displayed, the spinning wheel keeps turning because browser is still waiting for a response for some wrong icon urls.
|Steps To Reproduce||Example : searched Debian|
- result http://www.debian.org.hk/ [^] is displayed with default favicon
- document has a favicon : <link rel="shortcut icon" href="/files/favicon.ico" type="image/x-icon" />
|Additional Information||The rule applied in yacysearchitem.java is too simplistic : |
faviconURL = new DigestURL(resultURL.getProtocol() + "://" + resultURL.getHost() + ((port != -1) ? (":" + port) : "") + "/favicon.ico");
At first, document link with "rel=icon" should be searched.
See http://www.w3.org/TR/html5/links.html#rel-icon [^]
I guess icon url should be stored when crawling, to avoid reloadind and parsing document, at least when snipped strategy is not 'NOCACHE'.
|Tags||No tags attached.|
|Attached Files|| newfavicon.PNG [^] (12,460 bytes) 2015-12-20 19:29
oldfavicon.PNG [^] (11,953 bytes) 2015-12-20 19:29
IENetworkTrace.png [^] (80,252 bytes) 2016-01-04 21:56
testCases.csv [^] (1,157 bytes) 2016-02-10 10:10
by the way, this occures since the change in searchitem.html
form commit/your pull request https://github.com/yacy/yacy_search_server/commit/0e8b3d9a906b3f8cbd55d5483cc7a0285051b7c9 [^]
to the <object data=
<object data="ViewImage.png?width=16&height=16&code=#[faviconCode]#&isStatic=true" type="image/png" id="f#[urlhash]#" class="favicon" style="width:16px; height:16px;">
<img width="16" height="16" src="env/grafics/dfltfvcn.ico" style="width:16px; height:16px;" alt="" />
experimentally switching back to old <img src=ViewImage .... will show more favicons.
screen shots are attached
see quickfix https://github.com/yacy/yacy_search_server/commit/67f64af4b4fde9c16f4726c1554660f9d1df4a03 [^]
Maybe you have a better solution, as my back to old.
(the IE result is represented by the newfavicon.PNG attachment)
Thank you BuBu for testing with IE.
Sorry for IE users, I only tested my refactoring with Firefox, Chrome and Konqueror. Won't forget it in the future!
For information, after some analysis :
- there is no problem to use object tag to display images since IE8.
- here the problem is related to faviconCode expiration : when using object tag instead of img, IE performs two HTTP requests : a HEAD and a GET. The problem is that ViewImage handles these the same way, and consumes faviconCode licence on the HEAD request (call URLLicense.releaseLicense). Thus, licence code is no more available when performing GET (see attached network trace screenshot).
Firefox do the same, but it looks like it handles this better.
By the way, I will try this week to implement a better favicon discovery system, compatible with IE of course!
Some more advanced favicon management code already exists :
- net.yacy.document.Document has a favicon property
- ContentScraper is able to detect icon links : https://github.com/yacy/yacy_search_server/blob/14803d58cd8e2b23be370668323bd5263d734a91/source/net/yacy/document/parser/html/ContentScraper.java#L478 [^]
Unfortunately, for now the only place where it is used is in getpageinfo_p (API link in /ViewFile.html). For example : http://localhost:8090/api/getpageinfo_p.xml?actions=title,robots&url=https://play.google.com/store [^] display correctly the favicon entry (https://ssl.gstatic.com/android/market_images/web/favicon.ico [^]).
Icons are indexed, but as images (images_urlstub_sxt solr property), and then when loading metadata from solr index there is no way to distinguish favicons from other images (we may use favicon file name pattern, but favicons are not required to follow this).
So, to improve favicons finding, i suggest to add some properties to CollectionSchema :
Icons referenced with link rel="icon" would be added to these arrays and no more to images_urlstub_sxt.
Any better idea or suggestion?
I am currently experimenting some code modifications, but as usual it is not as simple as it could appears.
By the way, I noticed indexing icons differently than regular images would also be beneficial for image search results accuracy. Indeed SearchEvent.java excludes icons from image results relying only on image size and URL extension (.ico). But nowadays, icon images may be quite large, and are not necessarily ico files, expecially with non-standard icon links relations such as "apple-touch-icon".
I listed some non-standard icon link relations largely used on current big websites :
Does anyone know some other ones which would be useful to parse as icon instead of image?
edited on: 2016-02-02 10:07
For information there are also two meta tags used as icons :
- with name="msapplication-TileImage" (for windows 8 pinned pages)
- property="og:image" (for facebook opengraph nodes)
I commited a first implementation https://github.com/luccioman/yacy_search_server/commit/3cc5619d93394c85484e3f6d43a526f14c9aca0e. [^]
It is working fine in Web Portal mode. Still have non-regression tests to pass, testing in P2P mode with recrawl, and update ViewImage and pageinfo pages.
I finally added these new fields to CollectionSchema :
## all icon links without the protocol and '://'
## all icon links protocols : split from icons_urlstub to provide some compression, as http protocol is implied as default and not stored
## all icon links relationships space separated (e.g. 'icon apple-touch-icon')
## all icon sizes space separated (e.g. '16x16 32x32')
|Almost done! I attached testCases.csv file which contains some test case urls I have used.|
All use cases I am aware are ok for me.
Tested with a Debian Jessie node, and local browser or remote Win7 browser.
Of course changes are visible for newly or reccrawled pages in P2P mode or in Robinson mode.
Commits I made for this mantis begin from https://github.com/luccioman/yacy_search_server/commit/3cc5619d93394c85484e3f6d43a526f14c9aca0e [^] to https://github.com/luccioman/yacy_search_server/commit/3f338777f7007143d3d44e484bd371d76b36f2e0. [^]
|Hi, I updated and tested again Pull Request 39 (https://github.com/yacy/yacy_search_server/pull/39 [^]) with changes from main repository, so if you are interested, merge is ready.|
|2015-12-16 22:59||luc||New Issue|
|2015-12-20 19:28||BuBu||Note Added: 0001183|
|2015-12-20 19:29||BuBu||File Added: newfavicon.PNG|
|2015-12-20 19:29||BuBu||File Added: oldfavicon.PNG|
|2015-12-21 01:11||BuBu||Note Added: 0001184|
|2016-01-04 21:54||luc||Note Added: 0001191|
|2016-01-04 21:56||luc||File Added: IENetworkTrace.png|
|2016-01-05 14:42||luc||Note Added: 0001192|
|2016-01-27 12:18||luc||Note Added: 0001206|
|2016-02-02 10:05||luc||Note Added: 0001207|
|2016-02-02 10:07||luc||Note Edited: 0001207||View Revisions|
|2016-02-10 10:10||luc||File Added: testCases.csv|
|2016-02-10 10:11||luc||Note Added: 0001209|
|2016-02-11 09:37||luc||Note Added: 0001210|
|2016-07-03 19:10||luc||Note Added: 0001256|
|2016-11-03 21:32||BuBu||Status||new => resolved|
|2016-11-03 21:32||BuBu||Resolution||open => fixed|
|2016-11-03 21:32||BuBu||Assigned To||=> administrator|
|Copyright © 2000 - 2018 MantisBT Team|