YaCy-Bugtracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000648YaCyWishlist - Wunschlistepublic2016-03-24 13:362016-03-24 13:37
Reporterb0b3r 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusnewResolutionopen 
ETAnone 
PlatformOSOS Version
Product VersionYaCy 1.8 
Target VersionFixed in Version 
Summary0000648: Blac/white-lists entries and priorities.
DescriptionFor some use cases it would be good to be able to use whitelisting, and prioritize rules. The rule may have to be in CSV format:

<PRIO>,<B/W>,<RULE EXPRESSION TEXT>

The rules would be checked in descending order of priority and the first hit wins and decides whether it will be W-hitelisted or B-lacklisted.

Example scenario:
I want that only pages with the domain ".pl" be added to index. So at the beginning I blacklist everything:

10,B,".*"

Then I'm unlocking ".pl" domains. So I make a rule that whitelist it, and has higher priority than block all:

20,W,"*.pl/.*"

But I do not want to domains with 'porn' substring even in ".pl" subdomains. I have to blacklist it with higher priority rule:

30,B,".*porn.*\/.*"

But I also want to to have in the index pages of "Pidżama Porno" music band, which address "http://pidzamaporno.art.pl/" [^] contains 'porn' substring. So I just have to add whitelisting rule with higher priority:

40,W,"pidzamaporno.art.pl/.*"

This approach makes realization of even the most complex scenarios very easy. And it should also be relatively inexpensive in terms of CPU usage as it requires only a sort of ruleset by priority numeric values, and after the first hit, there is no need to check the rest of the ruleset.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
2016-03-24 13:36 b0b3r New Issue


Copyright © 2000 - 2019 MantisBT Team
Powered by Mantis Bugtracker