YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000612YaCy[All Projects] Generalpublic2015-10-27 17:372015-10-30 16:22
ReporterCollision 
Assigned To 
PriorityhighSeverityminorReproducibilityhave not tried
StatusnewResolutionopen 
ETAnone 
PlatformOSUbuntuOS Version 14.04 LTS
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000612: Solr generates too big files.
DescriptionThe file size that solr generates is too big.
It becomes up to 64GB.
By the result, the reading of the file takes long time, and during which time, I/O remains in maximum load.
I think that the file size should be limited small. (e.g. up to 4GB).
Additional InformationMain Memory: 24GB of DDR3 SDRAM (Allocating 20,000MiB for YaCy)
HDD: 3TB (No RAID)
Java version: 1.7.0_79

I attached a screenshot image file.
TagsNo tags attached.
Attached Filespng file icon YaCy_Solr generated files_1.png [^] (140,582 bytes) 2015-10-27 17:37

- Relationships

-  Notes
(0001123)
sixcooler (developer)
2015-10-27 20:32

Did You optimize your Solr-Index to a single segment at /IndexControlURLs_p.html?
(this will result in a single large file)
Per default the optimize that is executed at the end of an crawl will merge 10M documents per segment as far as I remember.
(0001124)
Collision (reporter)
2015-10-28 12:52

> Did You optimize your Solr-Index to a single segment at /IndexControlURLs_p.html?

I think that probably I have not done it.
(0001125)
sixcooler (developer)
2015-10-28 18:44

ok.
The I think Your segments got merged because solr merges segments whenever there are 10 of (almost) same size.

But I don't know a way to split segmanets or limit the segment size - may be others have an idea for this?
(0001126)
Collision (reporter)
2015-10-30 07:14

I tried to change following two files using gedit.
~/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_5_2/collection1/conf/solrconfig.xml
~/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_5_2/webgraph/conf/solrconfig.xml

    <!-- Merge Factor
         The merge factor controls how many segments will get merged at a time.
         For TieredMergePolicy, mergeFactor is a convenience parameter which
         will set both MaxMergeAtOnce and SegmentsPerTier at once.
         For LogByteSizeMergePolicy, mergeFactor decides how many new segments
         will be allowed before they are merged into one.
         Default is 10 for both merge policies.
      -->
    <!--
    <mergeFactor>100</mergeFactor>

'mergeFactor' value from 10 to 100.

By the way, for the splitting of segments, is this method related?
https://lucene.apache.org/core/5_2_0/misc/org/apache/lucene/index/MultiPassIndexSplitter.html [^]
(0001127)
Collision (reporter)
2015-10-30 16:22

Two files which I changed have been returned to original values after restarting of YaCy.

- Issue History
Date Modified Username Field Change
2015-10-27 17:37 Collision New Issue
2015-10-27 17:37 Collision File Added: YaCy_Solr generated files_1.png
2015-10-27 20:32 sixcooler Note Added: 0001123
2015-10-28 12:52 Collision Note Added: 0001124
2015-10-28 18:44 sixcooler Note Added: 0001125
2015-10-30 07:14 Collision Note Added: 0001126
2015-10-30 16:22 Collision Note Added: 0001127


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker