Are here! Yes, I finally dug into the 'Contextual Network Graph' Search::ContextGraph
stuff and came up with a replacement for the VectorSpace based 'similar entries' engine that was running here for some time now. It tool a lot of tweaking of parameters till it gave similar results to the VS setup I had...
The real fun part about it is this: for much the same result, it's blazingly FAST
. Where my VS thingie would take about 40+ min. of CPU time to complete a scan, the CG thing takes about 23 sec. !! And that'S w/o using the cool 'store' functionality that let's you store a ContextGraph for later use (a thing tha isn't technically possible with the VS approach)... And, this Search::ContextGraph is a pure Perl module with very little dependancies... VectorSpace needs a C-based extension (PDL)
So much for now, more details when I get back to breath ;)
[update:] something is weird with it on this server... shouldn't it find VectorSpace? Posts with that word in it should be similar to this one, no? Back to the drawing board, I guess...
[update:2003-07-23] ... it appears the results for VectorSpace and ContextGraph are rather similar if the posts are rather short... but with the size of the post CG fails... digging..
[ by Martin>]
similar entries (vs):
- HolaryHey! (# 23%)
- that reminds me (# 14%)
- similar entries now working (# 14%)
- Idle Words (# 13%)
similar entries (cg):