Projects that implement and employ Data Classification, Text Analytics, Information Retrieval and Information Extraction methods.
View all projects / open-source only
Easy-to-use library to determine the similarity between strings or sets of numbers using Jaccard Index, Minhashing and Locality-Sensitive Hashing.
Architectural choices behind Vokter v0.2, a multilingual document store with built-in diff detection.
Multilingual parser & indexer that uses Locality-Sensitive Hashing, DiffMatchPatch, Bloom filters and cronjobs to detect inserted and removed keywords from webpages.
Developing a decision-tree classifier and a data management module to evaluate win-lose probabilities over the course of a Poker Texas Hold'em game.