Interpretable, Symbolic Machine Learning
Let’s call it Machine Learning. As of today:
Mostly Neural networks
And mostly, that means it’s all black boxes
There are ways around that. (e.g. Trees and other “open book” algorithms…)
John H. Holland proposed an algorithm with the Cognitive System One program (1976). Later, people came up with variations… Today we focus on Michigan-style LCS.
A Data Mining Exercise with RLCS
> demo_params <- RLCS_hyperparameters(n_epochs = 280, deletion_trigger = 40, deletion_threshold = 0.9)
> rlcs_model1 <- rlcs_train_sl(demo_env1, demo_params, NULL, F)
[1] "Epoch: 40 Progress Exposure: 1280 Classifiers Count: 14"
[1] "Epoch: 80 Progress Exposure: 2560 Classifiers Count: 8"
[1] "Epoch: 120 Progress Exposure: 3840 Classifiers Count: 2"
[1] "Epoch: 160 Progress Exposure: 5120 Classifiers Count: 2"
[1] "Epoch: 200 Progress Exposure: 6400 Classifiers Count: 2"
[1] "Epoch: 240 Progress Exposure: 7680 Classifiers Count: 2"
[1] "Epoch: 280 Progress Exposure: 8960 Classifiers Count: 2"
if A & NOT(B) then Class=X
if D then Class=Y
“Human-readable”, “interpretable”, good for:
Mitigating bias(es) (in training data, at least)
Increased trust (justifying decisions)
Learning about the data (data mining), better decisions, regulatory compliance, ethical/legal matters, possible adversarial attack robustness…
https://github.com/kaizen-R/RLCS
To get the package from GitHub:
Run your first test
Before we continue:
Neural Networks accept numerical vectors for inputs.
Other algorithms accept factors, or mixed-input.
Well…
The RLCS package (specific/current implementation) expects binary strings for its input.
Rosetta Stone “binning” for numerical variables (2 bits explanation)
Rosetta Stone: 16 values, 4-bits, “double-quartiles” w/ Gray-binary encoding, per variable:
> head(iris, n=3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> rlcs_iris <- rlcs_rosetta_stone(iris, class_col=5) ## NOT part of LCS
> head(rlcs_iris$model, n=3)
rlcs_Sepal.Length rlcs_Sepal.Width rlcs_Petal.Length rlcs_Petal.Width class state
1 0010 1111 0011 0010 setosa 0010111100110010
2 0011 0101 0011 0010 setosa 0011010100110010
3 0000 1101 0000 0010 setosa 0000110100000010
Note: with some data loss :S
The key: “#” means “I don’t care”
Covering a state with a probability of “#” values means making a rule that matches the input state and class/action.
Something that could match other (partially) similar input:
If one(+) rule(s) in your population matches your new instance state -> increase the match count of the corresponding classifier.
If one(+) rule(s) in your population matches your new instance state && class/action –> increase the correct count.
Imagine a new sample/instance, never seen before. (Test environment)
Prediction is about returning the match set for that new instance.
The prediction will be the majority (possibly weighted by numerosity, accuracy…) of the proposed class/action. That’s it! It also means, this is natively an ensemble learning algorithm.
visualizing one classifier - iris
[1] "Accuracy: 0.98"
> table(test_mnist_bin01_49b[, c("class", "predicted")])
predicted
class 0 1 rlcs_no_match
0 1716 65 5
1 5 2008 0
>
> ## Training time on 800 samples:
> print(t_end - t_start)
Time difference of 1.937979 mins
!! Magic Trick: Parallelism: By splitting training data, and then consolidating sub-models! (Take that, Neural network :D)
This is all work in progress.
I plan to try and make it into a CRAN Package.
So: document more, write more tests, reorganize functions…