View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000629YaCy[All Projects] Generalpublic2015-12-16 22:592016-11-03 21:32
Assigned Toadministrator 
PlatformOSDebian GNU/LiuxOS Version8.2 Jessie
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000629: Default favicon used despite one is available
DescriptionIn search results, default favicon (http://localhost:8090/env/grafics/dfltfvcn.ico [^]) is often used, even if a favicon is effectively available for the result document.
One may think it is not really important, but it is slowing down the use of results page : even if all result rows are displayed, the spinning wheel keeps turning because browser is still waiting for a response for some wrong icon urls.
Steps To ReproduceExample : searched Debian
 - result http://www.debian.org.hk/ [^] is displayed with default favicon
 - document has a favicon : <link rel="shortcut icon" href="/files/favicon.ico" type="image/x-icon" />
Additional InformationThe rule applied in yacysearchitem.java is too simplistic :
faviconURL = new DigestURL(resultURL.getProtocol() + "://" + resultURL.getHost() + ((port != -1) ? (":" + port) : "") + "/favicon.ico");

At first, document link with "rel=icon" should be searched.
See http://www.w3.org/TR/html5/links.html#rel-icon [^]

I guess icon url should be stored when crawling, to avoid reloadind and parsing document, at least when snipped strategy is not 'NOCACHE'.
TagsNo tags attached.
Attached Filespng file icon newfavicon.PNG [^] (12,460 bytes) 2015-12-20 19:29

png file icon oldfavicon.PNG [^] (11,953 bytes) 2015-12-20 19:29

png file icon IENetworkTrace.png [^] (80,252 bytes) 2016-01-04 21:56

csv file icon testCases.csv [^] (1,157 bytes) 2016-02-10 10:10

- Relationships

-  Notes
BuBu (developer)
2015-12-20 19:28

by the way, this occures since the change in searchitem.html
form commit/your pull request https://github.com/yacy/yacy_search_server/commit/0e8b3d9a906b3f8cbd55d5483cc7a0285051b7c9 [^]

to the <object data=

  <object data="ViewImage.png?width=16&height=16&code=#[faviconCode]#&isStatic=true" type="image/png" id="f#[urlhash]#" class="favicon" style="width:16px; height:16px;">
      <img width="16" height="16" src="env/grafics/dfltfvcn.ico" style="width:16px; height:16px;" alt="" />

experimentally switching back to old <img src=ViewImage .... will show more favicons.
screen shots are attached
BuBu (developer)
2015-12-21 01:11

Hi luc,

see quickfix https://github.com/yacy/yacy_search_server/commit/67f64af4b4fde9c16f4726c1554660f9d1df4a03 [^]

Maybe you have a better solution, as my back to old.
(the IE result is represented by the newfavicon.PNG attachment)
luc (reporter)
2016-01-04 21:54

Thank you BuBu for testing with IE.
Sorry for IE users, I only tested my refactoring with Firefox, Chrome and Konqueror. Won't forget it in the future!

For information, after some analysis :
 - there is no problem to use object tag to display images since IE8.
 - here the problem is related to faviconCode expiration : when using object tag instead of img, IE performs two HTTP requests : a HEAD and a GET. The problem is that ViewImage handles these the same way, and consumes faviconCode licence on the HEAD request (call URLLicense.releaseLicense). Thus, licence code is no more available when performing GET (see attached network trace screenshot).
Firefox do the same, but it looks like it handles this better.

By the way, I will try this week to implement a better favicon discovery system, compatible with IE of course!
luc (reporter)
2016-01-05 14:42

Some more advanced favicon management code already exists :
 - net.yacy.document.Document has a favicon property
 - ContentScraper is able to detect icon links : https://github.com/yacy/yacy_search_server/blob/14803d58cd8e2b23be370668323bd5263d734a91/source/net/yacy/document/parser/html/ContentScraper.java#L478 [^]

Unfortunately, for now the only place where it is used is in getpageinfo_p (API link in /ViewFile.html). For example : http://localhost:8090/api/getpageinfo_p.xml?actions=title,robots&url=https://play.google.com/store [^] display correctly the favicon entry (https://ssl.gstatic.com/android/market_images/web/favicon.ico [^]).

Icons are indexed, but as images (images_urlstub_sxt solr property), and then when loading metadata from solr index there is no way to distinguish favicons from other images (we may use favicon file name pattern, but favicons are not required to follow this).

So, to improve favicons finding, i suggest to add some properties to CollectionSchema :
 - icons_urlstub.sxt
 - icons_height_val
 - icons_width_val

Icons referenced with link rel="icon" would be added to these arrays and no more to images_urlstub_sxt.

Any better idea or suggestion?
luc (reporter)
2016-01-27 12:18

I am currently experimenting some code modifications, but as usual it is not as simple as it could appears.
By the way, I noticed indexing icons differently than regular images would also be beneficial for image search results accuracy. Indeed SearchEvent.java excludes icons from image results relying only on image size and URL extension (.ico). But nowadays, icon images may be quite large, and are not necessarily ico files, expecially with non-standard icon links relations such as "apple-touch-icon".

I listed some non-standard icon link relations largely used on current big websites :
 - apple-touch-icon
 - apple-touch-icon-precomposed
 - fluid-icon
 - mask-icon
Does anyone know some other ones which would be useful to parse as icon instead of image?
luc (reporter)
2016-02-02 10:05
edited on: 2016-02-02 10:07

For information there are also two meta tags used as icons :
 - with name="msapplication-TileImage" (for windows 8 pinned pages)
 - property="og:image" (for facebook opengraph nodes)

I commited a first implementation https://github.com/luccioman/yacy_search_server/commit/3cc5619d93394c85484e3f6d43a526f14c9aca0e. [^]
It is working fine in Web Portal mode. Still have non-regression tests to pass, testing in P2P mode with recrawl, and update ViewImage and pageinfo pages.

I finally added these new fields to CollectionSchema :
## all icon links without the protocol and '://'

## all icon links protocols : split from icons_urlstub to provide some compression, as http protocol is implied as default and not stored

## all icon links relationships space separated (e.g. 'icon apple-touch-icon')

## all icon sizes space separated (e.g. '16x16 32x32')

luc (reporter)
2016-02-10 10:11

Almost done! I attached testCases.csv file which contains some test case urls I have used.
luc (reporter)
2016-02-11 09:37

All use cases I am aware are ok for me.
Tested with a Debian Jessie node, and local browser or remote Win7 browser.

Of course changes are visible for newly or reccrawled pages in P2P mode or in Robinson mode.

Commits I made for this mantis begin from https://github.com/luccioman/yacy_search_server/commit/3cc5619d93394c85484e3f6d43a526f14c9aca0e [^] to https://github.com/luccioman/yacy_search_server/commit/3f338777f7007143d3d44e484bd371d76b36f2e0. [^]
luc (reporter)
2016-07-03 19:10

Hi, I updated and tested again Pull Request 39 (https://github.com/yacy/yacy_search_server/pull/39 [^]) with changes from main repository, so if you are interested, merge is ready.

- Issue History
Date Modified Username Field Change
2015-12-16 22:59 luc New Issue
2015-12-20 19:28 BuBu Note Added: 0001183
2015-12-20 19:29 BuBu File Added: newfavicon.PNG
2015-12-20 19:29 BuBu File Added: oldfavicon.PNG
2015-12-21 01:11 BuBu Note Added: 0001184
2016-01-04 21:54 luc Note Added: 0001191
2016-01-04 21:56 luc File Added: IENetworkTrace.png
2016-01-05 14:42 luc Note Added: 0001192
2016-01-27 12:18 luc Note Added: 0001206
2016-02-02 10:05 luc Note Added: 0001207
2016-02-02 10:07 luc Note Edited: 0001207 View Revisions
2016-02-10 10:10 luc File Added: testCases.csv
2016-02-10 10:11 luc Note Added: 0001209
2016-02-11 09:37 luc Note Added: 0001210
2016-07-03 19:10 luc Note Added: 0001256
2016-11-03 21:32 BuBu Status new => resolved
2016-11-03 21:32 BuBu Resolution open => fixed
2016-11-03 21:32 BuBu Assigned To => administrator

Copyright © 2000 - 2021 MantisBT Team
Powered by Mantis Bugtracker