Analiticcl

Analiticcl is an approximate string matching or fuzzy-matching system that can be used for spelling correction or text normalisation (such as post-OCR correction or post-HTR correction). Texts can be checked against a validated or corpus-derived lexicon (with or without frequency information) and spelling variants will be returned.

The distinguishing feature of the system is the usage of anagram hashing to drastically reduce the search space and make quick lookups possible even over larger edit distances. The underlying idea is largely derived from prior work TICCL (Reynaert 2010; Reynaert 2004), which was implemented in ticcltools. This analiticcl implementation attempts to re-implement the core of these ideas from scratch, but also introduces some novelties, such as the introduction of prime factors for improved anagram hashing. We aim at a high-performant implementation written in Rust.

Bram Buitendijk
Scientific Developer BE Specialist
Harm Nijboer
Senior Data Manager
Jirsi Reinders
Data Curator
Judith Brouwer
Data Curator
Kerim Meijer
Scientific Developer BE Specialist
Leon van Wissen
Scientific Programmer
Maarten van Gompel
NER Specialist
Martin Reynaert
Data analyst
Menzo Windhouwer
Scientific Developer BE Specialist
Rob Zeeman
Scientific Developer BE Specialist