YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000668YaCy[All Projects] Generalpublic2016-07-02 00:562016-07-07 20:17
ReporterBuBu 
Assigned ToBuBu 
PrioritynormalSeveritymajorReproducibilityrandom
StatusresolvedResolutionfixed 
ETAnone 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0000668: Crawl Queue NPE in Intranet Mode, crawling file://
DescriptionIn Intranet mode, having one crawl job, crawling a larger file system.
At one point NPE occurs and seems to be a endless loop (without to recover).

Restart helps until error NPE occurs again.

Crawlqueue (corestack) has typically 2 depth files with size 0 in host queue (at least so far observed)

Debugging shows NPE happens in Table.removeOne() while Row.Entry.getPrimaryKeyBytes = null (assertion is off)
Steps To Reproduce-Intranet Mode (everything default)
-start crawl on filesystem like file://C:\tmp [^] or file://server/tmp [^]
    with Site Crawling (but happens with Expert crawl too)

-let it run and watch log (no general rule found when it starts)
   in my case it takes 5000 to 10000 documents until it occurs

-no recovery, one has to shutdown and restart

-crawler will contiune to run until failure happens again
Additional InformationW 2016/07/01 23:35:28 ConcurrentLog java.lang.NullPointerException
java.lang.NullPointerException
    at net.yacy.cora.order.Base64Order.cardinal(Base64Order.java:343)
    at net.yacy.cora.order.Base64Order.cardinal(Base64Order.java:36)
    at net.yacy.kelondro.index.RAMIndexCluster.indexFor(RAMIndexCluster.java:83)
    at net.yacy.kelondro.index.RAMIndexCluster.has(RAMIndexCluster.java:193)
    at net.yacy.kelondro.index.RowHandleMap.remove(RowHandleMap.java:336)
    at net.yacy.kelondro.table.Table.removeOne(Table.java:840)
    at net.yacy.kelondro.index.BufferedObjectIndex.removeOne(BufferedObjectIndex.java:268)
    at net.yacy.crawler.HostQueue.pop(HostQueue.java:428)
    at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:420)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:332)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:282)
    at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:313)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:105)
    at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)
W 2016/07/01 23:35:28 ConcurrentLog java.lang.NullPointerException
java.lang.NullPointerException
    at net.yacy.cora.order.Base64Order.cardinal(Base64Order.java:343)
    at net.yacy.cora.order.Base64Order.cardinal(Base64Order.java:36)
    at net.yacy.kelondro.index.RAMIndexCluster.indexFor(RAMIndexCluster.java:83)
    at net.yacy.kelondro.index.RAMIndexCluster.has(RAMIndexCluster.java:193)
    at net.yacy.kelondro.index.RowHandleMap.remove(RowHandleMap.java:336)
    at net.yacy.kelondro.table.Table.removeOne(Table.java:840)
    at net.yacy.kelondro.index.BufferedObjectIndex.removeOne(BufferedObjectIndex.java:268)
    at net.yacy.crawler.HostQueue.pop(HostQueue.java:428)
    at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:420)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:332)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:282)
    at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:313)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:105)
    at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)
E 2016/07/01 23:35:28 CRAWLER LOCALCRAWL[0, 1936, 0, 0]: CANNOT FETCH ENTRY: null
java.io.IOException
    at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:452)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:332)
    at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:282)
    at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:313)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:105)
    at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)
E 2016/07/01 23:35:28 CRAWLER LOCALCRAWL[0, 1936, 0, 0]: CANNOT FETCH ENTRY: null
TagsNo tags attached.
Attached Files

- Relationships
duplicate of 0000640resolvedBuBu crawler error loop crawling local hard drive in Intranet mode 

-  Notes
(0001257)
BuBu (developer)
2016-07-03 23:29
edited on: 2016-07-03 23:33

Further debugging shows 2 issues in intranet mode crawling filesystem
1. crawler hostqueue is opened twice (concurrently at the same time after restart)
   reason: internally host is identified by hosthash, URL-protocol is part of hosthash, so different hosthashes are calculated form active crawljob (with file:///xxxxx [^] and by init form cash, where hosthash is calculated just by name and standard http protocol part)
  result: raise condition and locking of crawler stackfiles by the second crawl job

-> possible solution... include hosthash in crawler cache stack directory name
                        or internally also just go by hostname instead of hosthash

2. a crawl-queue.pop() with null return isn't catched by the according error check (but loops in lower level try catch continue rounds). Reason for null return unclear.
  - possible solution.... enclose lowest level pop in try catch(NullPointerException) ?


- Issue History
Date Modified Username Field Change
2016-07-02 00:56 BuBu New Issue
2016-07-02 00:58 BuBu Relationship added duplicate of 0000640
2016-07-03 23:29 BuBu Note Added: 0001257
2016-07-03 23:33 BuBu Note Edited: 0001257 View Revisions
2016-07-07 20:17 BuBu Status new => resolved
2016-07-07 20:17 BuBu Resolution open => fixed
2016-07-07 20:17 BuBu Assigned To => BuBu


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker