Traumwind - Automatic Document Classification

Automatic Document Classification With Perl based on 'naive bayesian'... this seems to be exactly what I was searching for.

AI::Categorize::NaiveBayes allows the user to feed it the text of several documents (the training set), which it will parse and add to the word frequency database
<snip />
Once a sufficient number of training documents have been fed to the database and the needed probabilities have been calculated, we can start asking AI::Categorize::NaiveBayes to categorize new documents that it hasn't seen before. It returns to us an ordered list of the most probable categories for that document.

TeledyN: Comment on Graham's Plan for Spam
The specific filtering of spam also reminded me of the 1994 ACM project to produce a collaborative filter to rid the dying USENET from spam attacks; http://www.si.umich.edu/~presnick/papers/cscw94/GroupLens.htm chronicals the GroupLens project, and sure enough, there's the same Bayesian method at the root of it.

The ifile Web Site
ifile is a general mail filtering system that works with a mail client to intelligently filter mail according to the way the user tends to organize mail. ifile uses the machine learning algorithm Naive Bayes to classify e-mail documents.

Freely Available Filtering Systems, Information Filtering Resources

Personal WebWatcher, Project Page
Personal WebWatcher is a "personal" agent that accompanies you from page to page as you browse the web, highlighting hyperlinks that it believes will be of interest. Its strategy for giving advice is learned from feedback from earlier tours.

[ by Martin>] [permalink] [similar entries]

similar entries (vs):

The `Bow' Toolkit (# 11%)
EZ Bayesy (# 11%)
libTextCat - Lightweight text categorization (# 9%)
freshmeat.net: Project details for Bayesian Pattern Filtering Library (# 9%)

similar entries (cg):

data mining and machine learning in automated text classification (# 9%)
Paul Perry - Automated Collaborative Filtering in SQL (# 8%)
just a test (# 7%)
EZ Bayesy (# 6%)
freshmeat.net: Project details for Bayesian Pattern Filtering Library (# 6%)
Dave Farquhar on the new naive bayesian spam filter in Mozilla (# 6%)
research (reading list) (# 5%)

relevant words

Martin Spernau
© 1994-2003

Big things to come (TM) 30th Dez 2002

Only a part, not the whole
Oblique Strategies, Ed.3 Brian Eno and Peter Schmidt

amazon.de Wunschliste

usefull links:
Google Graph browser
Traumwind 6-Colormatch
UAV News

ifile (1.7)
filtering (1.7)
mail (1.7)
spam (1.6)
ai (1.6)
documents (1.5)
training (1.5)
webwatcher (1.5)
classification (1.4)
document (1.4)
automatic (1.4)
project (1.2)
filter (1.2)
database (1.1)
user (1)
/~ken/bayes/bayes.html (1)
accompanies (1)
naive (1)
tours (1)
www.ai.mit.edu (1)
/~teledyn/mt-comments.cgi (1)
resourcespersonal (1)
believes (1)
probabilities (1)
pagepersonal (1)
grahams (1)
intelligently (1)
www.enee.umd.edu (1)
bayes (1)
grouplens (1)
/~jrennie/ifile/ (1)
dying (1)
www.interhop.net (1)
/medlab/filter/software.html (1)
chronicals (1)
teledyn (1)
/afs/cs/project/theo-4/text-learning/www/pww/index.html (1)
probable (1)
attacks (1)
acm (1)
frequency (1)
sufficient (1)
mathforum.org (1)
siteifile (1)
bayesian (1)
categorize (1)
calculated (1)
www-2.cs.cmu.edu (0.9)
classify (0.9)
fed (0.9)
reminded (0.9)
strategy (0.9)
tends (0.9)
ordered (0.9)
highlighting (0.9)
hyperlinks (0.9)
agent (0.9)
usenet (0.9)
advice (0.9)
feedback (0.8)
asking (0.8)
algorithm (0.8)
page (0.8)
from (0.8)
parse (0.8)
categories (0.8)
web (0.8)
been (0.8)
collaborative (0.8)
freely (0.8)
root (0.8)
rid (0.8)
e-mail (0.8)
organize (0.8)
hasnt (0.8)
naive (0.8)
plan (0.7)
returns (0.7)
according (0.7)
learning (0.7)
giving (0.7)
method (0.7)
produce (0.7)
browse (0.7)
feed (0.7)
learned (0.7)
gt (0.7)
general (0.7)
the (0.7)
specific (0.7)
bayesian (0.7)
earlier (0.7)
number (0.7)
to (0.7)
it (0.7)
interest (0.7)
that (0.7)
will (0.7)
client (0.6)
comment (0.6)
seen (0.6)
of (0.6)
word (0.6)
exactly (0.6)
several (0.6)
needed (0.6)
systems (0.6)
personal (0.6)
available (0.6)
searching (0.6)
enough (0.6)
start (0.6)
add (0.6)
machine (0.6)
allows (0.6)
theres (0.6)
lt (0.6)
set (0.6)
works (0.6)
for (0.6)
us (0.6)
list (0.6)
http (0.5)
sure (0.5)
once (0.5)
information (0.5)
its (0.5)
uses (0.5)
have (0.5)
seems (0.5)
perl (0.5)
before (0.5)
a (0.5)
system (0.5)
based (0.5)
text (0.5)
you (0.5)
be (0.5)
same (0.5)
most (0.5)
is (0.5)
with (0.5)
on (0.5)
we (0.4)
new (0.4)
which (0.4)
and (0.4)
also (0.4)
way (0.4)
at (0.4)
an (0.3)
what (0.3)
was (0.3)
me (0.3)
as (0.3)
can (0.3)
this (0.2)
i (0.2)
martin (0.1)
(0)