Anonymous | Login | Signup for a new account | 2021-01-18 23:23 CET | ![]() |
Main | My View | View Issues | Change Log | Roadmap |
View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||||||
0000765 | YaCy | Wishlist - Wunschliste | public | 2017-08-11 22:37 | 2019-07-28 08:38 | ||||||||
Reporter | edycop | ||||||||||||
Assigned To | |||||||||||||
Priority | high | Severity | major | Reproducibility | always | ||||||||
Status | new | Resolution | open | ||||||||||
ETA | none | ||||||||||||
Platform | AMD | OS | Debian GNU/Linux | OS Version | 9 | ||||||||
Product Version | YaCy 1.9 | ||||||||||||
Target Version | Fixed in Version | ||||||||||||
Summary | 0000765: Doesn't parse ok the numbers in a xlsx file | ||||||||||||
Description | I've tested to index a xlsx file but has a problem identifying numbers. This is what I tried: 1) with ods format, file://home/edycop/Documents/Prueba.ods, [^] and in "Parsed Sentences" section it shows: Nombre Cedula Edwin Caldon 10290230 2) with xls format, file://home/edycop/Documents/Prueba.xls, [^] in "Parsed Sentences" section it shows: &"Times New Roman,Regular"&12&A Nombre Cedula Edwin Caldon 10290230 &"Times New Roman,Regular"&12Page &P 3) with xlsx format, file://home/edycop/Documents/Prueba.xlsx, [^] "Parsed Sentences" section it shows: 01210290230&C&"Times New Roman,Regular"&12&A&C&"Times New Roman,Regular"&12Page &P And when I do a search by the ID number obviously in the list of results appear the two first files but the last doesn't. If you see in the last parsed result it shows a number with other numbers at beginning that doesn't below to the ID number, why happen this? Thanks. | ||||||||||||
Tags | No tags attached. | ||||||||||||
Attached Files | |||||||||||||
![]() |
|
(0001468) luc (reporter) 2017-08-15 11:00 |
Parsing on test files included in YaCy sources (https://github.com/yacy/yacy_search_server/blob/Release_1.92/test/parsertest/umlaute_mac.xlsx [^] and https://github.com/yacy/yacy_search_server/blob/Release_1.92/test/parsertest/umlaute_windows.xlsx [^]) also reveals that the involved parser (ooxmlParser) seems not able to parse the content of xlsx files. It looks like xlsx format support has to be updated. |
![]() |
|||
Date Modified | Username | Field | Change |
2017-08-11 22:37 | edycop | New Issue | |
2017-08-15 11:00 | luc | Note Added: 0001468 |
Copyright © 2000 - 2021 MantisBT Team |