YaCy-Bugtracker - YaCy
View Issue Details
0000648YaCyWishlist - Wunschlistepublic2016-03-24 13:362016-03-24 13:37
b0b3r 
 
normalfeatureN/A
newopen 
none 
YaCy 1.8 
 
0000648: Blac/white-lists entries and priorities.
For some use cases it would be good to be able to use whitelisting, and prioritize rules. The rule may have to be in CSV format:

<PRIO>,<B/W>,<RULE EXPRESSION TEXT>

The rules would be checked in descending order of priority and the first hit wins and decides whether it will be W-hitelisted or B-lacklisted.

Example scenario:
I want that only pages with the domain ".pl" be added to index. So at the beginning I blacklist everything:

10,B,".*"

Then I'm unlocking ".pl" domains. So I make a rule that whitelist it, and has higher priority than block all:

20,W,"*.pl/.*"

But I do not want to domains with 'porn' substring even in ".pl" subdomains. I have to blacklist it with higher priority rule:

30,B,".*porn.*\/.*"

But I also want to to have in the index pages of "Pidżama Porno" music band, which address "http://pidzamaporno.art.pl/" [^] contains 'porn' substring. So I just have to add whitelisting rule with higher priority:

40,W,"pidzamaporno.art.pl/.*"

This approach makes realization of even the most complex scenarios very easy. And it should also be relatively inexpensive in terms of CPU usage as it requires only a sort of ruleset by priority numeric values, and after the first hit, there is no need to check the rest of the ruleset.
No tags attached.
Issue History
2016-03-24 13:36b0b3rNew Issue

There are no notes attached to this issue.