Fri, 03 Jan 2003 05:14:16 -0800
Just got on the list, would like to introduce my approach to general AI:
Intelligence is not an empirically specific ability, - we can learn anything. Therefore its method can only be derived from functional definition of intelligence, not for or from any specific property of inputs, Well, with one exception: the inputs can't be random within a given coordinate system, if they are intelligence is useless.
Mainstream position of 'artificial intelligentsia' seems to be that intelligence is a lot of different things, which I think is largely responsible for the pathetic state of AI. Any general term is applicable to different things, but there must be something in common, otherwise we wouldn't use it to describe them all.
Expert Systems simply prepackage & reuse human-generated knowledge, with no independent ability to generate new knowledge.
Neural Nets imitate insufficiently understood brain processes,- a first functional intelligence produced by evolution, & as such, probably the least efficient possible.
Problem-solving algorithms require first manually describing a specific environment, which is what intelligence is really for. The cognitive part of problem solving is discovering compressive patterns to represent the subject/environment, solution itself is a brute-force search of alternative projections of these patterns to maximize values.
Evolutionary algorithms utilize random changes of the code, that could be useful to maximize a specific value. But I think the real problem is to define what is it that we want to maximize.
In pattern recognition there has been no theoretically consistent way to value & encode match & resulting patterns, making hierarchical schemes & scaling impractical.
Algorithmic/Computational learning theory, while conceptually very close to my approach, doesn't seem to connect the purpose with the method. 'Learning theorists', being mathematicians, consider all possible cases & no method can serve them all. I'm only concerned with the real world, characterized by spatio-temporal continuum. This 'specification' makes the whole idea meaningful.
I define intelligence as an ability to produce expectations of future inputs through recognition & interactive projection of past inputs' patterns. This includes planning, which technically is a self-prediction.
Pattern is a set of inputs the record of which can be compressively replaced with a record of a vector: a value that converts new inputs into old ones by any operation: identity, addition, multiplication, & so on. Compression here is a reduction of recorded magnitude compared to the magnitude of restorable inputs.
Compression accumulated over all inputs of a pattern determines its predictive value, - the extent & the direction of its' projection. Net value of a pattern is its predictive value minus average predictive value produced by equivalent resources,- memory & operations. Predictive power of a system is increased by deleting patterns of negative value.
Expanding range of search for compressive vectors(matches) & increasing syntactic complexity of resulting patterns require resources affordable only to patterns of corresponding predictive value. Thus, recorded patterns should form a hierarchy of compression / search range & syntactic complexity, with each level divided into fixed-range search units.
Higher levels have fewer units with greater range, ultimately leading to a single top-level unit with a global range of search. Patterns are elevated as long as their predictive value exceeds required resources of a destination level, & deleted when shift of inputs places them beyond their range of search.
A pattern can be predictive only if derived & projected within a given metric system. The only metrics we can use a priori is Spatio-Temporal: in our environment physical causality is S-T continuous, therefore there is a strong compressibility between S-T'adjacent elements, which declines with the distance/delay.
That means search must proceed in S-T order, starting from minimal complexity S-T adjacent inputs, such as raw multimedia: single-variable inputs (pixels of a video), ordered initially in one dimension (a scan line). Within S-T unit a search can be re-ordered by variable types of sufficient predictive value.
This may seem primitive, but that's precisely the point, all the complexity must be learned. Generalization is a reduction, & an intelligent system that starts with higher-level, especially language-level data, can't recreate the semantic context, that is generalized real-world 4-dimensional patterns that most words in the language stand for.
I don't believe in combining different methods because cognition deals with the unknown, - we can't a priori split it into different areas, except to the extent that they're sensor/hardware specific, or levels, except that syntactic complexity of inputs should be sequentially increased.
Any methodological differentiation should be not in a kind but in degree, which must be determined by an intelligent system rather than programmed into it. The 'types' of inputs are ultimately the types of empirical objects they represent, an intelligence should be able to learn them on its own.
Scalability is of the essence, & a truly scalable method should start from the beginning: the limit of resolution of raw sensory data, from which all higher data types are ultimately derived. I currently work on a procedure that would initiate cognition by processing a digitized video, here is a (very) rough outline:
Original inputs representing pixels of a horizontal scan line consist of 2 variables: brightness(B) & coordinate(C), indicating the order of input. Each input is compared only with previous input of the same line, because range & dimensionality of search should expand in proportion with pattern's previous recurrence, which originally is 1 & 0D.
After comparison two more variables are added: length(L) & compression(R), indicating cumulative match of, respectively, dC & B. In case of match the difference(dB) is preserved & compared with that from previous comparison forming higher derivatives. Multi-derivative pattern results: C,L,B,R (C,L,dB,R (C,L,ddB,R... In case of miss a pattern is terminated & contrast( a negative pattern) is formed.
On the next level 1-dimensional patterns are compared to those of a previous scan line with overlapping horizontal C+L. The same process of encoding repeats here: 2D Ps are formed by adding Vert.Coord & Length, as well as R & derivatives for each variable of 1D P.
Next, 2D Ps are compared with those of adjacent confocal 2D Frames of a given Base & Angle with overlapping Horiz & Vert C+L, forming 3D Ps with a new set of Depth C & L, R, & derivative patterns.
Continuous search is completed by forming Temporal 4D Ps from matching 3D Ps of adjacent frames of video with overlapping Horiz, Vert, & Depth C+L.
Subsequently, completed 4D Ps with above-average compression are compared over discontinuity on higher levels of search with expanding range of Coords, increasing degree of compression & syntactic/semantic complexity, eventually achieving & surpassing that of natural languages.
Would appreciate any comments.