Let’s call it Machine Learning. Today that’s mostly Neural networks. And mostly, that means it’s all black boxes.
There are ways around that. (e.g. Trees and other “open book” algorithms…)
John H. Holland proposed an algorithm with the Cognitive System One program (1976). Later, people came up with variations… Today we focus on Michigan-style LCS.
Have you ever found something that no one else has done?
Imagine you receive samples of data points (aka states) with their corresponding classes (aka actions). All your input entries (aka instances) form an environment.
In this case, we will accept binary string states for the environment instances (examples):
State | Action/Class | From the… |
---|---|---|
0010111100110010 | “setosa” | Iris dataset |
001010 | 1 | “Mux 6” |
Well, for now… Yes. There are alternatives. But for now…
Here using 4-bits with Gray encoding of “double quartiles” per variable, we can create binary string:
The key: “#” means “I don’t care”
Covering a state with a probability of “#” values means making a rule that matches the input state and class/action.
Something that could match other (partially) similar input:
You generate new rules when you see new environment instances that do not match any rule yet. Suppose you have a set of rules. That’s your population (of rules).
If one(+) rule(s) in your population matches your new instance state -> increase the match count of the corresponding classifier.
If one(+) rule(s) in your population matches your new instance state && class/action –> increase the correct count.
After a few epochs of exposing the LCS to your full environment, you will have a few rules that match correctly a given instance, the correct set.
Match and Correct count are indicators of how good each rule is. But are there other better possibilities?
Take all the correct set, and apply a Genetic Algorithm to that set, to generate new rules!
Matching must go through all the population every time an environment instance is presented to the LCS.
\[O(match) = epochs*N(environment)*N(population)\]
Where \(N()\) means “size of”.
\[e.g. 1.000 * 5.000 * 1.000 = 5.000.000.000\]
One option: Reduce the population of rules.
Subsumption: A “perfect” classifier that has 100% accuracy might be simpler (more # characters) than other classifiers in the population with same classification. Keep only the best classifiers. (Implemented: Accuracy-based subsumption)
Compaction: You can keep all classifiers that have e.g. 60%+ accuracy after a number of epochs.
Deletion: But you can also cap the population size, keeping only e.g. 10.000 best classifiers.
Imagine a new sample/instance, never seen before. (Test environment)
Prediction is about returning the match set for that new instance.
Majority (possibly weighted by numerosity, accuracy…) of proposed class/action will be the prediction.
That’s it! It also means, this is natively an ensemble learning algorithm.
Why talk about “environment” and “action”? This comes from the world of Reinforcement Learning.
And because one can read the rules, and “understand” the population, you can also use the LCS to interpret the results and thus do data mining!
All with the same algorithm!
visualizing one classifier - iris
LCS can somehow recognize how two different parts interact. Aptly… The term comes from of genetics (genes modified by other genes…). (e.g. XOR…)
Given that the rules are “expressive”, sometimes you can ask the LCS to find rules that appear in your data, NOT to classify future samples, but to identify what is important in different classes of your data.
REAL WORLD anecdote: inventory of 10K rows with 20 columns, each duly binary encoded. I learnt something about my inventory!
First self-brain-storm on RL with LCS
A package to implement a simple version of the Learning Classifier System algorithm:
Binary Alphabet, tournament/one-point crossover GA, accuracy based, Michigan-style LCS
With examples, demonstrating the implementation for:
Data Mining
Supervised Learning
Reinforcement Learning
Lists. Lists, everywhere. Which might have been a bad idea… (data.table?)
from there, lapply() & al. is then my best friend
Start small, grow fast (because I get obsessed)
then clean it
then clean it some more
ever postponing the move to a Package, though
Before
lcs_res <- rlcs_meta_train(train_environment,
1, ## Warmup with just one epoch
wildcard_prob,
rd_trigger,
parents_selection_mode,
mutation_probability,
tournament_pressure,
deletion_trigger) ## Deletion won't be triggered
Too many parameters! (Uncle Bob wouldn’t like it)
After, using an object (reference class, “R5”, in this case)
Or, you know…
source("run_params/datamining_examples_recommended_hyperparameters_v001.R")
basic_hyperparameters <- RLCS_hyperparameters(
wildcard_prob = wildcard_prob,
## defaults for rd_trigger, mutation_probability,
## parents_selection_mode && tournament_pressure
n_epochs = n_epochs,
deletion_trigger = deletion_trigger,
deletion_threshold = deletion_threshold
)
## It makes it more readable here:
example_lcs <- rlcs_train(train_environment, basic_hyperparameters)
Before
inc_match_count <- function(M_pop) { ## All versions
lapply(M_pop, \(x) {
x$match_count <- x$match_count + 1
x
})
}
inc_correct_count <- function(C_pop) { ## SL Specific
lapply(C_pop, \(x) {
x$correct_count <- x$correct_count + 1
x
})
}
inc_action_count <- function(A_pop) { ## RL Specific
lapply(A_pop, \(x) {
x$action_count <- x$action_count + 1
x
})
}
After, using a function factory
## Function factory to increase parameter counts
inc_param_count <- function(param) {
param <- as.name(param)
function(pop) {
lapply(pop, \(x) {
x[[param]] <- x[[param]] + 1
x
})
}
}
inc_match_count <- inc_param_count("match_count")
inc_correct_count <- inc_param_count("correct_count")
inc_action_count <- inc_param_count("action_count")
Before
## Support function for human-compatible printing:
make_pop_printable <- function(classifier) {
df <- plyr::rbind.fill(lapply(1:length(classifier), \(i) {
t_c <- classifier[[i]]
data.frame(id = t_c$id,
condition = t_c$condition_string,
action = t_c$action,
match_count = t_c$match_count,
correct_count = t_c$correct_count,
accuracy = t_c$accuracy,
numerosity = t_c$numerosity,
first_seen = t_c$first_seen)
}))
df[order(df$accuracy, df$numerosity, decreasing = T),]
}
(Even the parameter name is wrong…)
After - S3 object (dependency! plyr::rbind.fill)
print.rlcs_population <- function(pop) {
if(length(pop) == 0) return(NULL)
pop <- lcs_best_sort_sl(pop)
pop <- unclass(pop)
plyr::rbind.fill(lapply(1:length(pop), \(i) {
t_c <- pop[[i]]
data.frame(condition = t_c$condition_string, action = t_c$action,
match_count = t_c$match_count, correct_count = t_c$correct_count,
accuracy = t_c$accuracy, numerosity = t_c$numerosity,
first_seen = t_c$first_seen)
}))
}
This is all work in progress.
I plan to make it into a CRAN Package.
So: document more, write more tests, reorganize functions…
THE BEST INTRO to the LCS Algorithm out-there (hands down!) (12’ video)
And for my first RevealJS Quarto, this blog entry (not mine)
Probably quite a few more, including the two in the first picture.
It’s not fast
There are many sequential steps, rather unavoidable ones at that
Not ideal to compete with a world of GPUs and parallel processing (yet ;))
It’s “complex”
Or so does the Wikipedia entry say…
When it comes to “alphabets”, it does get messy, I’ll admit
For instance, this is a “slow” algorithm. Option: RCpp for Matching? (under testing right now!)
profviz
Parallel computing? %dopar% was tested (it works, but…)
Break data set (vertical). Two options
instances subsets (reduce population covered per thread/core)
Substrings of states (reduce search space)
both are “risky”
run fewer iterations (epochs) on full dataset, but on several cores in parallel
Depending on the dataset, it can be done… Or not…
We’ve seen it works, but… How do you package an RL algorithm?
You must make assumptions about the “world” your agent is going to interact with. This makes things complicated:
What to include inside the package? What not?
What to expose from the package? What not?
And a few other such questions slow me down a bit…