[agi] Intelligence: a pattern discovery algorithm of scalable complexity.

Boris Kazachenko
Sat, 29 Mar 2008 19:27:27 -0700

Here's another try:

I think the main reason for the failure of AI is that no existing approach is derived from a theoretically consistent definition of intelligence. Some, such as Algorithmic Information Theory, are close but not close enough. Scalable (general) intelligence must recursively self-improve: continuously develop new mechanisms.

These mechanisms must be selected according to a universal criterion, which can only be derived from a functional definition of intelligence.

I define intelligence as an ability to predict/plan by discovering & projecting patterns within an input flow. For an excellent high-level discussion see "On intelligence" by Jeff Hawkins.

We know of one mechanism that did produce an intelligence, although a pretty messed-up one: the evolution. Initially algorithmically very simple, evolution changes heritable traits at random & evaluates results for reproductive fitness. But biological evolution is ludicrously inefficient because intelligence is only one element of reproductive fitness, & selection is extremely coarse-grained: on the level of a whole genome rather than of individual traits.

From my definition, a fitness function specific to intelligence is
predictive correspondence of the memory. Correspondence is a representational analog of reproduction, maximized by an internalized evolution:

- the heritable traits for evolving predictions are past inputs, &

- the fitness is their cummulative match to the following inputs.

Match (fitness) should be quantified on the lowest level of comparison,- this makes selection more incremental & efficient. The lowest level is comparison between two single-variable inputs, & the match is a partial identity: a complimentary of the difference, or the smaller of the variables. This is also a measure of analog compression: a sum of bitwise AND between uncompressed comparands (represented by strings of ones).

To speedup, the search algorithm must incorporate increasingly more complex shortcuts to discover better predictions (the speed is what it's all about, otherwise we can just sit back & let the biological evolution do the job).

These more complex predictions (patterns) & pattern discovery methods are derived from the past inputs of increasing comparison range & order: derivation depth.

The most basic shortcuts are based on the assumption that the environment is not random:
- Input patterns are decreasingly predictive with the distance.
- Pattern is increasingly predictive with the accumulated match, & decreasingly so with the difference between constituent inputs.

A core algorithm based on these assumptions would be an iterative step that selectively increases range & complexity of the patterns in proportion to their projected cummulative match:

The original inputs are single variables produced by senses, such as pixels of visual perception. Their subsequent comparison by iterative subtraction adds new variable types: length & aggregate value for both partial match & miss (derivatives) for each variable of the comparands. The inputs are integrated into patterns (higher-level inputs) if the additional projected match is greater than the system's average for the computational resources necessary to record & compare additional syntactic complexity. Each variable of thus-formed patterns is compared on a higher level of search & can form its own pattern.

On the other hand, if predictive value (projected match) falls below the systems' average, the input pattern is aggregated with adjacent "subcritical" patterns by iterative addition, into a lower-resolution input. Aggregation results in a "fractional" projection range for constituent inputs, as opposed to "multiple" range for matching inputs. By increasing magnitude of the input it increases its projected match: a subset of the magnitude. Aggregation also produces the averages to determine resolution of future inpus & evaluate their matches.

So, the alternative integrated/aggregated representations of inputs are produced by iterative subtraction/addition (the neural analogs are inhibition & excitation), both determined by comparison among the respective inputs. It's a kind of evolution where neither traits nor their change are really produced at random. The inputs are inherently predictive on the average by the virtue of their proximity, & the change is introduced either by new inputs (proximity update), or as incremental syntax of the old inputs, produced by their individual predictiveness evaluation: comparison, selectively incremental in distance & derivation.

The biggest hangup people usually have is that this kind of algorithm is obviously very simple, while working intelligence is obviously very complex. But, as I tried to explain, additional complexity should only improve speed, rather than changing the "direction" of cognition (although it may save a few zillion years). The main requirement for such algorithm is that it maximizes benefit (predictive power) to cost (complexity) ratio.

I would summarize the algorithm as Comparison-Projection, -
a more constructive analog to Jeff Hawkins' Memory-Prediction.

Hope this makes sense. It's just a foundation of my approach, & I still have theoretical problems to solve. However, the core principles are formal & final, & that's more than I can say for any other approach.

Please let me know if you see any contradictions or have questions/comments.


RSS Feed:
Modify Your Subscription:
Powered by Listbox: