0000724YaCy[All Projects] Generalpublic2017-01-30 16:012017-02-02 08:59
Platformx86-64OSFreeBSDOS Version10.3
Summary0000724: hostbrowser.xml does not contain the site queried
DescriptionHostBrowser.xml output files do not contain the original site. See https://github.com/yacy/yacy_search_server/blob/ff6589fc0f4332bc83f89f875b62e7670762e4ee/htroot/HostBrowser.xml [^] for example.

By comparison, webstructure.xml does contain the site queried inside xml attribute tags.
Steps To ReproduceOne queries "http://localhost:8090/HostBrowser.xml?hosts=example.com". [^]

Resulting Hostbrowser.xml output does not contain the domain example.com in the file itself.
luc (reporter)
2017-01-31 08:55

Hi, you are not using the right parameter. To restrict results to a specific hostname, the "path" parameter must be used instead of "hosts".

Thus it would be : http://localhost:8090/HostBrowser.xml?path=example.com [^]
DNcrawler (reporter)
2017-02-02 08:07

I am calling it correctly, I just copy/pasted the wrong line.

Here's the line from the script I have:

http://localhost:8090/HostBrowser.xml?path=example.com [^]

The resulting xml file doesn't have the domain as specified in "path" in the file. Here's the output xml file after running the command above:

<?xml version="1.0"?>

  <root />

    <host name="example.org" count="4116" />
    <host name="example.net" count="62" />

luc (reporter)
2017-02-02 08:59

Ok, but is it really a problem?
To my mind it is in the webstructure.xml result because when queried without the "about" parameter, "out" and "in" <references> tag can contain multiple host references lists. So each <domain> tag has to specify which host it is about.

In the HostBrowser.xml structure, <inbount> and <outbound> lists are only produced for the path you specified, so I guess we can assume that path is known when parsing the result...

