NCISC as a search engine : now it works too !

We’ve had mixed results so far when using the results of our algorithm for search-engine like applications, i.e. finding the documents relevant for a given term. The symmetric neighbourhood, for instance documents similar to a document OR words similar to a word have always shown great results, but the results in the asymmetric case were much less relevant.

We have finally found the problem, and here are some results from an analysis of the English Wikipedia (70 features, no information added) :

Wikipedia pages relevant for the word « turing » : (titles are completely ignored, as well as page popularity) :

  • Hypercomputation
  • Solomonoff’s theory of inductive inference
  • Unbounded nondeterminism
  • Super-recursive algorithm
  • Theory of computation
  • Automated theorem proving
  • List of important publications in theoretical computer science
  • Turing machine
  • Turing completeness
  • Sheila Greibach
  • Church–Turing thesis
  • Algorithmic information theory
  • Universal Turing machine
  • Wolfram’s 2-state 3-symbol Turing machine
  • Digital physics
  • Logical framework
  • Fuzzy logic
  • Denotational semantics
  • List of machine learning algorithms
  • Parallel computation thesis
  • Computational geometry
  • Algorithm
  • Logic in computer science
  • Operational semantics
  • Satisfiability Modulo Theories
  • Automated reasoning
  • Information theory
  • Denotational semantics of the Actor model
  • Oracle machine
  • Indeterminacy in concurrent computation
  • Baum–Welch algorithm
  • List of books in computational geometry
  • John V. Tucker
  • List of PSPACE-complete problems
  • Power domains
  • List of computability and complexity topics

Similarly, getting the most relevant words for a given page « Turing machine » gives pretty good results :

  • turing
  • recursive
  • arithmetic
  • boolean
  • mappings
  • recursion
  • deterministic
  • automata
  • algorithmically
  • algorithm
  • melzak
  • iterating
  • computes
  • compute
  • leiserson
  • provably
  • iterate
  • cormen
  • computations
  • subproblems
  • definable
  • embedding
  • automaton
  • subtraction
  • satisfiability
  • undecidable
  • iteratively
  • constraint
  • deterministically
  • tuples
  • tuple
  • associative
  • pseudocode
  • pseudorandom
  • reachability
  • completeness
  • recursively
  • iterative
  • unary
  • logic

 

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *