Anonymous | Login | Signup for a new account | 2021-01-17 08:00 CET | ![]() |
Main | My View | View Issues | Change Log | Roadmap |
View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||
0000717 | YaCy | [All Projects] General | public | 2017-01-05 21:26 | 2017-01-06 03:04 | ||||
Reporter | BuBu | ||||||||
Assigned To | BuBu | ||||||||
Priority | normal | Severity | major | Reproducibility | always | ||||
Status | resolved | Resolution | fixed | ||||||
ETA | none | ||||||||
Platform | OS | OS Version | |||||||
Product Version | |||||||||
Target Version | Fixed in Version | ||||||||
Summary | 0000717: Index document with wrong field content for metadata from html tags | ||||||||
Description | After parsing/crawling several documents, some index documents have wrong content e.g. in h1_txt and other text extracts from html tags (e.g. underline_txt or image_* index fields) Example: <h1_txt> is not part of the page at all: <doc> <str name="id">oqcOSGHd8iIa</str> <str name="sku">http://worldbuilding.stackexchange.com/questions/66895/ultimate-australian-canal</str> [^] <arr name="title"> <str>climate - Ultimate Australian Canal - Worldbuilding Stack Exchange</str> </arr> <arr name="h1_txt"> <str>Spendenaufruf : Wikipedia sammelt 8,7 Millionen Euro</str> </arr> <int name="h1_i">1</int> | ||||||||
Additional Information | Debug Info: Tag/field content comes from the scraper. The used scraper is remembered in the htmlParser. But parser is reused for several documents and the used scraper is set to the current document, while the indexing process might work on a earlier document. yacy2solr gets in this concurrency situation the earlier document but current scraperObject. | ||||||||
Tags | No tags attached. | ||||||||
Attached Files | |||||||||
![]() |
|
(0001370) BuBu (developer) 2017-01-06 03:04 |
see commit https://github.com/yacy/yacy_search_server/commit/4c9be29a55b51d9937137806ed4f248875c32a2b [^] |
![]() |
|||
Date Modified | Username | Field | Change |
2017-01-05 21:26 | BuBu | New Issue | |
2017-01-05 21:26 | BuBu | Status | new => assigned |
2017-01-05 21:26 | BuBu | Assigned To | => BuBu |
2017-01-06 03:04 | BuBu | Note Added: 0001370 | |
2017-01-06 03:04 | BuBu | Status | assigned => resolved |
2017-01-06 03:04 | BuBu | Resolution | open => fixed |
Copyright © 2000 - 2021 MantisBT Team |