YaCy-Bugtracker - YaCy
View Issue Details
0000686YaCy[All Projects] Generalpublic2016-09-21 09:202016-09-22 00:22
luc 
BuBu 
normalcrashalways
resolvedfixed 
none 
YaCy 1.8 
 
0000686: Broken peer after 2 mode switches
YaCy can be broken after two modes switches using /ConfigBasic.html, needing a restart to recover.
- You have a peer running in P2P mode, or freshly installed (no DATA)
- Go to "Use case & account" menu (/ConfigBasic.html)
- Switch to "Intranet Indexing" OR "Search portal" mode and save configuration
- Switch back to "Community-based web search"
- The /ConfigBasic.html ends up in a HTTP 500 error
- Go back to the main web search page and search something : it also ends up in a HTTP 500 error
- Config is now inconsistent : in /ConfigBasic.html, selected use case is "Community-based web search", but in /ConfigNetwork_p.html "Robinson mode" is still selected
- After YaCy restart and setting "Peer-to-Peer Mode" in /ConfigNetwork_p.html, everything works fine again
Initially reported by vikozo on YaCy forum : http://forum.yacy-websuche.de/viewtopic.php?f=18&t=5869 [^]

Workaround : restart YaCy after each mode switch
No tags attached.
txt yacysearch_error_trace.txt (11,764) 2016-09-21 09:23
http://mantis.tokeek.de/file_download.php?file_id=240&type=bug
Issue History
2016-09-21 09:20lucNew Issue
2016-09-21 09:23lucFile Added: yacysearch_error_trace.txt
2016-09-21 11:53lucNote Added: 0001296
2016-09-21 12:53lucNote Added: 0001297
2016-09-21 20:25BuBuNote Added: 0001299
2016-09-21 21:11BuBuNote Added: 0001300
2016-09-21 21:15BuBuNote Added: 0001301
2016-09-21 22:48lucNote Added: 0001302
2016-09-21 22:49lucNote Edited: 0001302bug_revision_view_page.php?bugnote_id=1302#r375
2016-09-22 00:22BuBuNote Added: 0001303
2016-09-22 00:22BuBuStatusnew => resolved
2016-09-22 00:22BuBuResolutionopen => fixed
2016-09-22 00:22BuBuAssigned To => BuBu

Notes
(0001296)
luc   
2016-09-21 11:53   
The root cause is that the Solr index write lock is not correctly released at the first mode switch, and thus can not be obtained when trying to open again that core :
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: [install_path]/DATA/INDEX/freeworld/SEGMENTS/solr_5_5/collection1/data/index/write.lock

I don't know why this lock is not released, it should done by solr be when closing the embedded solr server at this line :
https://github.com/yacy/yacy_search_server/blob/Release_1.90/source/net/yacy/search/Switchboard.java#L1333 [^]

Any idea to solve this is welcome...
(0001297)
luc   
2016-09-21 12:53   
I experimented with two Solr locking configuration options :
- unlockOnStartup set to true : doesn't fix the problem
- lockType set to single instead of native : the problem is effectively fixed and then you apparently can do any mode switches you wish

The seconde solution may work, but should be considered with great care because the native lock ensured that only one process has write access to the solr index.

The question is do we really need this kind of lock or can we consider it is the responsibility of the user to ensure only one YaCy process is launched using the same DATA folder?
(0001299)
BuBu   
2016-09-21 20:25   
To my investigation.... look at the EmbeddedSolrConnector.close() and it's comment (don't know the background of the comment).

Short test showed, by enabeling last commented line, lock problem (500 status)
can be resolved.

    @Override
    public synchronized void close() {
        if (this.core != null && !this.core.isClosed()) try {this.commit(false);} catch (final Throwable e) {ConcurrentLog.logException(e);}
        try {super.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        // we do NOT close the core here because that is closed if the enclosing instance is closed
        // do NOT uncomment the following line, which caused a "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" error
  ---> try {this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
    }

Maybe we can find a way to go around the initially commented issue with some
if (this.core.isclose()) ??
(0001300)
BuBu   
2016-09-21 21:11   
for easier verification added test case, simulating this issue
https://github.com/yacy/yacy_search_server/commit/11786457b762dbb6eae5bbb17a20672d20323dc0 [^]
(0001301)
BuBu   
2016-09-21 21:15   
this is doing it for me ..... for you too ?
    @Override
    public synchronized void close() {
        if (this.core != null && !this.core.isClosed()) try {this.commit(false);} catch (final Throwable e) {ConcurrentLog.logException(e);}
        try {super.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        // we do NOT close the core here because that is closed if the enclosing instance is closed
        // do NOT uncomment the following line, which caused a "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" error
        // try {this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
        
        // added a this.core.isClosed() check, because otherwise on mode switches a reopen of this core fails see http://mantis.tokeek.de/view.php?id=686 [^]
        try {if (!this.core.isClosed()) this.core.close();} catch (final Throwable e) {ConcurrentLog.logException(e);}
    }
(0001302)
luc   
2016-09-21 22:48   
(edited on: 2016-09-21 22:49)
Hello BuBu, thank you for your investigation.
I finally came to the same conclusion as you and yes this solution is also working for me and looks far preferable than modifying the locking config.

I also found the super.close() did not release the lock because SolrCore references count is higher than 1 at that moment (see SolrCore.close() : https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.2/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1174 [^]).

Further investigation may help understand more precisely why there is one excessive ref (refCount is increased at SolrCore.open()) but I think you can safely already commit the fix.

I also discovered by the way one another point to fix : ErrorCache is holding a Fulltext reference which become deprecated after switching mode. I will commit a modification after further testing.

(0001303)
BuBu   
2016-09-22 00:22   
Thanks for the feedback.

Finally I applied a little bit different way (without to revert the changes made earlier) to get the core closed (as you confirmed .... that's the part needed to be fixed).
As the mentioned comment states one had the intention to close the core with the instance.... so the commit does now this (close the embedded instance).

https://github.com/yacy/yacy_search_server/commit/330768c8a27f254cbd5706533d56af676d12ec67 [^]