View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000765YaCyWishlist - Wunschlistepublic2017-08-11 22:372019-07-28 08:38
Assigned To 
PlatformAMDOSDebian GNU/LinuxOS Version9
Product VersionYaCy 1.9 
Target VersionFixed in Version 
Summary0000765: Doesn't parse ok the numbers in a xlsx file
DescriptionI've tested to index a xlsx file but has a problem identifying numbers. This is what I tried:

1) with ods format, file://home/edycop/Documents/Prueba.ods, [^] and in "Parsed Sentences" section it shows:
Nombre Cedula Edwin Caldon 10290230

2) with xls format, file://home/edycop/Documents/Prueba.xls, [^] in "Parsed Sentences" section it shows:
&"Times New Roman,Regular"&12&A Nombre Cedula Edwin Caldon 10290230 &"Times New Roman,Regular"&12Page &P

3) with xlsx format, file://home/edycop/Documents/Prueba.xlsx, [^] "Parsed Sentences" section it shows:
01210290230&C&"Times New Roman,Regular"&12&A&C&"Times New Roman,Regular"&12Page &P

And when I do a search by the ID number obviously in the list of results appear the two first files but the last doesn't. If you see in the last parsed result it shows a number with other numbers at beginning that doesn't below to the ID number, why happen this? Thanks.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
luc (reporter)
2017-08-15 11:00

Parsing on test files included in YaCy sources (https://github.com/yacy/yacy_search_server/blob/Release_1.92/test/parsertest/umlaute_mac.xlsx [^] and https://github.com/yacy/yacy_search_server/blob/Release_1.92/test/parsertest/umlaute_windows.xlsx [^]) also reveals that the involved parser (ooxmlParser) seems not able to parse the content of xlsx files.

It looks like xlsx format support has to be updated.

- Issue History
Date Modified Username Field Change
2017-08-11 22:37 edycop New Issue
2017-08-15 11:00 luc Note Added: 0001468

Copyright © 2000 - 2021 MantisBT Team
Powered by Mantis Bugtracker