The issue of false inputs in language acquisition is a well known one and considered critical to NLP applications. For instance, computer automated acquisition of language through data on the Internet will require to sift through a large percentage of false data. Statistical models try to overcome this difficulty by assuming that most of the data is correct and therefore the ‘right’ knowledge will overcome the ‘wrong’ one and practically drown it in the statistical pool created.
This approach to sifting out erroneous data is also being employed by humans when learning language. However, humans assist other factors in assessing the correctness of the input. Since a person has quite a bit of knowledge about his sources, he can judge their credibility and take it into consideration while processing the data originating from these sources. Our perception of people as credible or not is not binary (but fuzzy) and constantly develops. A quick introspection will reveal the complexity of the credibility system that we maintain and how it dynamically contributes to our linguistic learning over time.
Therefore, when coming to build a language acquisition model, it seems we should give the issue of source credibility its well deserved attention. Language acquisition in humans is linked to the sources of the data and to our ever-developing attitude towards them. Also, humans tend to acquire their first linguistic knowledge in a credibility pure environment (all sources are 100% credible), so it might be a good start for computer language acquisition as well.
Just a few points regarding this issue to highlight its complexity:
1. Not all sources are identical and a source’s properties dynamically change over time.
2. A certain input usually carries more weight when it comes from many sources. It seems that credibility is in some way accumulative over sources and time.
3. In some cases, even when the data is cast aside as false due to credibility problem, it still doesn’t disappear and might be reinstated and admitted into our linguistic system if and when the source’s credibility will update. However, usually this will happen only if the falseness of the data is ambiguous and the change in credibility happens shortly after the input is gained.
Post a Comment