Textklassifikation

Google gibt großen n-Gramm-Datensatz frei

Ausgerechnet ein halbes Jahr, nachdem ich das gebraucht hätte, schreibt Philipp Lenssen, dass Google die Freigabe eines großen n-Gramm-Datensatzes angekündigt hätte: Google to Release N-gram Data. Seufz...

Pattern Analysis Library | Project Info

Interessante Bibliothek Nummer 2:

The Pattern Analysis Library (PALib) is a pattern classification/recognition library for C++ programmers. The library consists of numerical and statistical routines which range from statistical decision theory, parametric and non-parametric learning algorithms, and linear classification to supervised and unsupervised machine learning methods. PALib is portable across all platforms which support the ANSI-compliant C++ language and STL.

Leider ohne Demoanwendungen.

The “Bow” Toolkit

Interessante Bibliothek Nummer 1:

Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).

Der Zeitstempel auf der Webseite ist allerdings der 12. September 1998...

Inhalt abgleichen