YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000686YaCy[All Projects] Generalpublic2016-09-21 09:202016-09-22 00:22
Reporterluc 
Assigned ToBuBu 
PrioritynormalSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
ETAnone 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000686: Broken peer after 2 mode switches
DescriptionYaCy can be broken after two modes switches using /ConfigBasic.html, needing a restart to recover.
Steps To Reproduce- You have a peer running in P2P mode, or freshly installed (no DATA)
- Go to "Use case & account" menu (/ConfigBasic.html)
- Switch to "Intranet Indexing" OR "Search portal" mode and save configuration
- Switch back to "Community-based web search"
- The /ConfigBasic.html ends up in a HTTP 500 error
- Go back to the main web search page and search something : it also ends up in a HTTP 500 error
- Config is now inconsistent : in /ConfigBasic.html, selected use case is "Community-based web search", but in /ConfigNetwork_p.html "Robinson mode" is still selected
- After YaCy restart and setting "Peer-to-Peer Mode" in /ConfigNetwork_p.html, everything works fine again
Additional InformationInitially reported by vikozo on YaCy forum : http://forum.yacy-websuche.de/viewtopic.php?f=18&t=5869 [^]

Workaround : restart YaCy after each mode switch
TagsNo tags attached.
Attached Filestxt file icon yacysearch_error_trace.txt [^] (11,764 bytes) 2016-09-21 09:23 [Show Content]

- Relationships

-  Notes
(0001296)
luc (reporter)
2016-09-21 11:53

The root cause is that the Solr index write lock is not correctly released at the first mode switch, and thus can not be obtained when trying to open again that core :
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: [install_path]/DATA/INDEX/freeworld/SEGMENTS/solr_5_5/collection1/data/index/write.lock

I don't know why this lock is not released, it should done by solr be when closing the embedded solr server at this line :
https://github.com/yacy/yacy_search_server/blob/Release_1.90/source/net/yacy/search/Switchboard.java#L1333 [^]

Any idea to solve this is welcome...
(0001297)
luc (reporter)
2016-09-21 12:53

I experimented with two Solr locking configuration options :
- unlockOnStartup set to true : doesn't fix the problem
- lockType set to single instead of native : the problem is effectively fixed and then you apparently can do any mode switches you wish

The seconde solution may work, but should be considered with great care because the native lock ensured that only one process has write access to the solr index.

The question is do we really need this kind of lock or can we consider it is the responsibility of the user to ensure only one YaCy process is launched using the same DATA folder?
(0001299)
BuBu (developer)
2016-09-21 20:25

To my investigation.... look at the EmbeddedSolrConnector.close() and it's comment (don't know the background of the comment).

Short test showed, by enabeling last commented line, lock problem (500 status)
can be resolved.

    @Override
    public synchronized void close() {
        if (this.core != null && !this.core.isClosed()) try {this.commit(false);} catch (final Throwable e) {ConcurrentLog.logException(e);}
        try {super.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        // we do NOT close the core here because that is closed if the enclosing instance is closed
        // do NOT uncomment the following line, which caused a "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" error
  ---> try {this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
    }

Maybe we can find a way to go around the initially commented issue with some
if (this.core.isclose()) ??
(0001300)
BuBu (developer)
2016-09-21 21:11

for easier verification added test case, simulating this issue
https://github.com/yacy/yacy_search_server/commit/11786457b762dbb6eae5bbb17a20672d20323dc0 [^]
(0001301)
BuBu (developer)
2016-09-21 21:15

this is doing it for me ..... for you too ?
    @Override
    public synchronized void close() {
        if (this.core != null && !this.core.isClosed()) try {this.commit(false);} catch (final Throwable e) {ConcurrentLog.logException(e);}
        try {super.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        // we do NOT close the core here because that is closed if the enclosing instance is closed
        // do NOT uncomment the following line, which caused a "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" error
        // try {this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        
        // added a this.core.isClosed() check, because otherwise on mode switches a reopen of this core fails see http://mantis.tokeek.de/view.php?id=686 [^]
        try {if (!this.core.isClosed()) this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
    }
(0001302)
luc (reporter)
2016-09-21 22:48
edited on: 2016-09-21 22:49

Hello BuBu, thank you for your investigation.
I finally came to the same conclusion as you and yes this solution is also working for me and looks far preferable than modifying the locking config.

I also found the super.close() did not release the lock because SolrCore references count is higher than 1 at that moment (see SolrCore.close() : https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.2/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1174 [^]).

Further investigation may help understand more precisely why there is one excessive ref (refCount is increased at SolrCore.open()) but I think you can safely already commit the fix.

I also discovered by the way one another point to fix : ErrorCache is holding a Fulltext reference which become deprecated after switching mode. I will commit a modification after further testing.

(0001303)
BuBu (developer)
2016-09-22 00:22

Thanks for the feedback.

Finally I applied a little bit different way (without to revert the changes made earlier) to get the core closed (as you confirmed .... that's the part needed to be fixed).
As the mentioned comment states one had the intention to close the core with the instance.... so the commit does now this (close the embedded instance).

https://github.com/yacy/yacy_search_server/commit/330768c8a27f254cbd5706533d56af676d12ec67 [^]

- Issue History
Date Modified Username Field Change
2016-09-21 09:20 luc New Issue
2016-09-21 09:23 luc File Added: yacysearch_error_trace.txt
2016-09-21 11:53 luc Note Added: 0001296
2016-09-21 12:53 luc Note Added: 0001297
2016-09-21 20:25 BuBu Note Added: 0001299
2016-09-21 21:11 BuBu Note Added: 0001300
2016-09-21 21:15 BuBu Note Added: 0001301
2016-09-21 22:48 luc Note Added: 0001302
2016-09-21 22:49 luc Note Edited: 0001302 View Revisions
2016-09-22 00:22 BuBu Note Added: 0001303
2016-09-22 00:22 BuBu Status new => resolved
2016-09-22 00:22 BuBu Resolution open => fixed
2016-09-22 00:22 BuBu Assigned To => BuBu


Copyright © 2000 - 2017 MantisBT Team
Powered by Mantis Bugtracker