This algorithm is a more efficient analog of the algorithm that calculates delta for my Supervised Delta Algorithm. There are two versions, one that trains on the full dataset, and one that trains a compressed dataset, further reducing the runtime by a factor of roughly 1/2. The underlying algorithm is the same for both methods, and there are two command lines attached, one that implements each, both set up as functions for convenience. The accuracies are excellent, except for UCI Abalone for some strange reason. The runtime is about 1 second (full dataset) and .5 seconds (compressed dataset) for every 2,500 rows. The accuracies below are the averages over 100 iterations per dataset, split into 85% training rows, 15% testing rows. As you can see, compression doesn’t really impact accuracy too much, but it is roughly twice as fast, which can make a big difference for the intended product, which is designed for a large number of rows run locally (i.e., on a desktop / laptop).
| Dataset | Accuracy (Full) | Accuracy (Compressed) |
| UCI Abalone | 59.03% | 58.30% |
| UCI Credit | 83.08% | 81.20% |
| UCI Ion | 96.41% | 95.39% |
| UCI Iris | 98.80% | 96.15% |
| UCI Parkinsons | 93.30% | 93.26% |
| UCI Sonar | 92.47% | 88.12% |
| UCI Spam | 96.92% | 94.49% |
| UCI Wine | 97.22% | 96.85% |
Discover more from Information Overload
Subscribe to get the latest posts sent to your email.