<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Yoav Artzi</title>
	<atom:link href="http://cs.tinyways.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://cs.tinyways.com</link>
	<description></description>
	<pubDate>Mon, 29 Dec 2008 21:14:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
	<language>en</language>
			<item>
		<title>Creating Entities Through Parts-of-Speech Tags and Dependency Structures</title>
		<link>http://cs.tinyways.com/creating-entities-through-part-of-speech-and-dependency-structures/</link>
		<comments>http://cs.tinyways.com/creating-entities-through-part-of-speech-and-dependency-structures/#comments</comments>
		<pubDate>Wed, 24 Dec 2008 19:21:21 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[dependency structure]]></category>

		<category><![CDATA[machine translation]]></category>

		<category><![CDATA[mt]]></category>

		<category><![CDATA[natural language]]></category>

		<category><![CDATA[nlp]]></category>

		<category><![CDATA[parts of speech]]></category>

		<category><![CDATA[pos]]></category>

		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=213</guid>
		<description><![CDATA[<p>[<em>This idea is a very rough sketch. Therefore, it's quite messy and requires some work and learning.</em>]</p>
<p>The combination of dependency structures and part-of-speech (POS) tagging supplies us with the following information:<br />
1. The linguistic category to which the word belongs<br />
2. The words around it which are related to it<br />
3. The type of that relation</p>
<p><strong>An Idea</strong><br />
Merge many dependency structures into one big graph. The merge should be made by merging identical vertices. For example: the node &#8216;world&#8217; across all the dependency graphs will be merged into  a single vertex.<!--more--> This will create a pretty big graph with complex connection. In this graph we&#8217;ll focus on the nouns and extract them to be entities. Through looking at the other vertices that they are connected to (both other nouns and other parts of speech nodes), we can learn information regarding the entity that the noun represents. Since dependency the edges are labeled as well, we can even say something about the nature of these connections. It might be better to define a threshold of edges that will allow us to deduce a link. This way we can overcome accidental usage of a word, rare meanings that it has and simple errors. </p>
<p>For example, one thing we can do with this graph is to look only at the nouns and search for cycles in the graph. Such cycles can provide information about some kind of close relationship between the nouns in that cycle. Maybe it&#8217;s even possible to try to assess the nature of the relationship by looking at the path between two nouns. </p>
<p>Using this method we can build a huge database of entities with interesting applicative possibilities. For example: search engines can use such a database to improve search and the presentation of the search results for the user. A bit more far fetched possibility, but much more interesting, is the ability to actually learn something about the nature of languages through this. Furthermore, if we examine the structure we have as a graph, we can combine graphs from two language by connecting the entity nodes and use the web we receive.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/creating-entities-through-part-of-speech-and-dependency-structures/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Human Assisted Entity Harvesting</title>
		<link>http://cs.tinyways.com/human-assisted-entity-harvesting/</link>
		<comments>http://cs.tinyways.com/human-assisted-entity-harvesting/#comments</comments>
		<pubDate>Mon, 22 Dec 2008 19:42:06 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[database]]></category>

		<category><![CDATA[entity]]></category>

		<category><![CDATA[information extraction]]></category>

		<category><![CDATA[machine translation]]></category>

		<category><![CDATA[nlp]]></category>

		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=224</guid>
		<description><![CDATA[<p>[<em>This post is a very quick sketch.</em>]</p>
<p>Most information-rich websites, such as IMDB and AllMusic, display data in a highly structured template. This highly structured system can be enriched using meta-tagging of fields and relations between them. Similar to what you have in relational databases, but much richer and semantically oriented. If such meta-data is shared with search engines it can be used to create databases of entities, their attributes and relations between them.<!--more--> For example, if AllMusic will provide a link between an artist, which will be tagged as a &#8216;human&#8217; to his &#8216;records&#8217; using the connection &#8216;created&#8217;, it will be possible for search engines to read this data and use it to display results better or even harvest the web with partial semantical awareness.</p>
<p>In fact, I am quite sure search engines are already doing so to a certain extent. However, I am not familiar with a open and defined scheme that can describe this meta data and allow easy sharing of it. It has to be a flexible scheme that will allow to define various relationship (verbs).</p>
<p>The obvious usage for this method is in the field of information extraction. However, it can also prove useful in machine translation. Such mapped web sites will be much easier to translate. It can also help us to build a database of entities for research purposes. Something that can prove to be very useful in knowledge representation research. Therefore, for research, harvesting a few large sites and building entity databases from them, can prove very helpful and is quite feasible.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/human-assisted-entity-harvesting/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Copying Human Tactics in Developing NLP Models</title>
		<link>http://cs.tinyways.com/copying-human-tactics-in-developing-nlp-models/</link>
		<comments>http://cs.tinyways.com/copying-human-tactics-in-developing-nlp-models/#comments</comments>
		<pubDate>Sat, 29 Nov 2008 08:26:17 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[ambiguity]]></category>

		<category><![CDATA[CHILDES]]></category>

		<category><![CDATA[cognitive science]]></category>

		<category><![CDATA[cogsci]]></category>

		<category><![CDATA[language]]></category>

		<category><![CDATA[learning]]></category>

		<category><![CDATA[nlp]]></category>

		<category><![CDATA[statistical models]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=137</guid>
		<description><![CDATA[<p>The question of how humans handle language is of great value and deserves quite a bit of deep research on its own. However, its value lies beyond the natural boundaries of cognitive science. Since the human mind presents a working language model, we can copy from it while building computerized language models. Even a shallow introspection can lead to some quick conclusions which might be of great aid.<!--more--></p>
<p>One of the interesting conclusion such an introspective action can lead us to is that the probabilistic direction, although not the only mechanism, is probably a step in the right direction. The way humans make mistakes at times, their fast deduction process, its result sometimes being altered after another second of thought and their tactics in handling ambiguity. All of these and many more linguistic behaviors point to a probabilistic approach in humans. However, as it seems (especially by the second-thought more accurate process), this approach is not the only mechanism and might be employed in an iterative fashion (answer quick and then try with slower a process to validate the answer).</p>
<p>One project which is really interesting in this case is the CHILDES corpus developed at CMU. Children don&#8217;t develop their language model by reading the WSJ10 corpus. Instead, they start with a very simple input and slowly develop their ability to handle more complex inputs. Since we know language acquisition in children to work quite well, we might be able to deduce from it a good start point for machine learning. If it&#8217;s simpler for children to acquire language in a gradual manner, it might also be the right way for computers.</p>
<p>The point is not to solve the human questions, but to use the information and logic revealed by a shallow examination. Later on, when there will be more information available, there will be more to copy.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/copying-human-tactics-in-developing-nlp-models/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Dealing with Ambiguousness the Way Humans Do</title>
		<link>http://cs.tinyways.com/dealing-with-ambiguousness-the-way-humans-do/</link>
		<comments>http://cs.tinyways.com/dealing-with-ambiguousness-the-way-humans-do/#comments</comments>
		<pubDate>Sun, 26 Oct 2008 20:47:55 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[ambiguity]]></category>

		<category><![CDATA[ambiuous]]></category>

		<category><![CDATA[language]]></category>

		<category><![CDATA[nlp]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=94</guid>
		<description><![CDATA[<p>Here I wish to outline (and sort a bit) two principles I have in mind in dealing with ambiguousness in linguistic input. My point is to suggest ways that might reflect upon the way the human mind deals with this issue. Ambiguity issues are solved by the human mind seamlessly and with probably very little intervention of higher logic and conscious thinking. This is shown by the fact that ambiguous input is not seen as such in most cases. Its ambiguous nature is only revealed after a second look and guided search. This suggests that the issue is dealt with low level simple methods.<!--more--></p>
<p>The first principle is frequency. Basically it means that when faced with ambiguous input the subject should decide on which meaning to choose based on the past frequency of the various options. The option with the highest frequency will win. To optimize this principle one should pick the range of time properly. Meaning, the amount of time into the past that is used when calculating past frequencies of meanings. I believe this parameter, as used by the human mind, to be dynamic and at times results in a segment of time that is not even continuous. For example, one might understand a lecture better remembering what was said in the previous lecture two weeks ago and not what was said in the time between the lectures.</p>
<p>In <a href="http://cs.tinyways.com/nautral-language-representation/">one of my other posts</a> here I talked about the idea of representing language through objects. Such objects, like linguistic items, will have to be linked and these links will have to reflect the ambiguity of the input. This gives us another option to over ambiguity - take the hottest links first. Meaning that one should chose the most recently used (MRU) path when coming to take links between objects. This method reflects the importance of the current situation on interpreting the input. It&#8217;s this second principle which I believe is actually of higher importance in daily life.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/dealing-with-ambiguousness-the-way-humans-do/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Nautral Language Representation</title>
		<link>http://cs.tinyways.com/nautral-language-representation/</link>
		<comments>http://cs.tinyways.com/nautral-language-representation/#comments</comments>
		<pubDate>Sat, 25 Oct 2008 19:45:02 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[language]]></category>

		<category><![CDATA[nlp]]></category>

		<category><![CDATA[object oriented]]></category>

		<category><![CDATA[presentation]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=91</guid>
		<description><![CDATA[<p>Natural language is both a human creation and used only by humans. This suggests a strong connection between the traits of the language itself and the inner works of the human brain. It seems to me like that the structure of the brain is well optimized for the task and at the same time the structure of natural languages is optimized for the human brain. This, of course, hints at Chomsky&#8217;s universal grammar idea. However, it can also mean that structures that are &#8216;natural&#8217; to humans might underlie the logical structure of languages.<!--more--></p>
<p>Our way of thinking is motivated by objects. It&#8217;s this understanding that lead to the creation of object oriented programming in the early 1960s. Therefore, it&#8217;s somewhat natural to deduce that the underlying data structures used by the brain for language processing might also consist of object oriented representation of our world. This, of course, ignores the more immediate problem of parsing and decoding. However, I still believe this to be theoretically interesting. I wish to suggest a very rough draft for such a system.</p>
<p>Let&#8217;s have for every type of object in the world a &#8216;class&#8217;, which is essentially the data type of this object. This is very similar to Aristotelian ideas (universal forms). This will allow the mind to hold data types with attributes and methods for each object in the universe. In a way, it&#8217;s a high level object oriented programming language. Each class will have components, which are basically members of other classes. For verbs we will create VerbTypes, which will be methods. Such methods will require parameters like the predicate and various optional arguments. For each verb we will have a verb type and each class will implement all of them. Such an implementation in a class will basically show what happens when you commit this verb on an object of this class. We will also have AdjTypes for each type of adjective (color, taste, smell and etc.). Again, each class will have attributes for each AdjType and these can be initialized or not. Also, a class will extend higher level classes.</p>
<p>To summarize, a class will look something like this:<br />
Class Apple extends Fruit {<br />
// Methods of type VerbType:<br />
eat();<br />
throw();<br />
burn();<br />
&#8230;</p>
<p>// Attributes of type AdjType:<br />
color = &#8230;<br />
taste = &#8230;<br />
smell = &#8230;<br />
&#8230;</p>
<p>// All sort of components:<br />
Seeds<br />
Peel<br />
&#8230;<br />
}</p>
<p>When creating a real world situation, the subject creates an inner representation of the outside situation. Also, it&#8217;s possible, once the grammatical structure has been chosen, to populate a sentence with objects through the relations created by the methods and their arguments. Thus we can construct sentences describing the happenings inside the virtual representation of the reality inside our minds. </p>
<p>This is, of course, only a rough sketch and it only comes to outline an attempt to deal with the problem of inner representation of language and real life situations. Naturally, it is far from complete and neglects many critical problems.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/nautral-language-representation/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Semi-hardwiring of Langauge Specific Attributes</title>
		<link>http://cs.tinyways.com/semi-hardwiring-of-langauge-specific-attributes/</link>
		<comments>http://cs.tinyways.com/semi-hardwiring-of-langauge-specific-attributes/#comments</comments>
		<pubDate>Sat, 25 Oct 2008 15:12:13 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[brain]]></category>

		<category><![CDATA[language acquisition]]></category>

		<category><![CDATA[learning]]></category>

		<category><![CDATA[nlp]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=87</guid>
		<description><![CDATA[<p>Kids have an easy time getting a new language. This difference is not only in the ease of acquisition, but also in the higher quality of the result. It seems that a child&#8217;s brain is more susceptible and ready to acquire a new language. It&#8217;s known that the human brain is being massively hard wired during the first years of a child&#8217;s life. It can be hypothesized that certain attributes of the language being learnt at such early stages are hard-wired into the infant&#8217;s brain.<!--more--></p>
<p>It&#8217;s not the whole language that is being hard-wired, as the big mass of learning (the vocabulary) is being made gradually over the years until the child reaches adulthood. However, it seems that the connections that form during this period enable better language acquisition and facilitate better learning of a certain language.</p>
<p>This system has two further effects:<br />
1. It will make it relatively harder to acquire new language further on, especially when these differ greatly from the subject&#8217;s native language or even contradict it.<br />
2. It will probably make it relatively easy to acquire languages based on similar founding principles. For example, it seems that German speakers have a relatively easy time learning English and French acquire Spanish with relative ease.</p>
<p>The idea that different languages bring to different wiring casts doubt upon the idea of universal grammar as advocated by Chomsky. Of course, it doesn&#8217;t mean that such a universal underlying structure doesn&#8217;t exist. It just means that if it does exist, it&#8217;s at very low level and the brain is actually wired according to a higher level logic. It also hints towards a gradual learning of a language, something many statistical models, which confront the learning machine with tons of data, somehow seem to miss. </p>
<p>It&#8217;s hard to say at what level the hardwired logic is. It&#8217;s definitely not the vocabulary of the language, probably not even its most basic parts. However, it&#8217;s something language specific. Maybe it&#8217;s the division into part of speech and the relation between these, or certain very basic grammatical structures and building blocks.</p>
<p>A few further note supporting this thesis:<br />
In the few recorded cases of children who were isolated from society and weren&#8217;t given the opportunity to acquire a language, &#8220;post-liberation&#8221; language acquisition was never complete and their abilities didn&#8217;t reach these of normal children.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/semi-hardwiring-of-langauge-specific-attributes/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Source Credibility in Language Acquisition</title>
		<link>http://cs.tinyways.com/credibility-in-language-acquisition/</link>
		<comments>http://cs.tinyways.com/credibility-in-language-acquisition/#comments</comments>
		<pubDate>Sun, 31 Aug 2008 19:09:43 +0000</pubDate>
		<dc:creator>Yoav</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[credibility]]></category>

		<category><![CDATA[language acquisition]]></category>

		<category><![CDATA[learning]]></category>

		<category><![CDATA[nlp]]></category>

		<category><![CDATA[statistical models]]></category>

		<guid isPermaLink="false">http://cs.tinyways.com/?p=79</guid>
		<description><![CDATA[<p>The issue of false inputs in language acquisition is a well known one and considered critical to NLP applications. For instance, computer automated acquisition of language through data on the Internet will require to sift through a large percentage of false data. Statistical models try to overcome this difficulty by assuming that most of the data is correct and therefore the &#8216;right&#8217; knowledge will overcome the &#8216;wrong&#8217; one and practically drown it in the statistical pool created. <!--more--></p>
<p>This approach to sifting out erroneous data is also being employed by humans when learning language. However, humans assist other factors in assessing the correctness of the input. Since a person has quite a bit of knowledge about his sources, he can judge their credibility and take it into consideration while processing the data originating from these sources. Our perception of people as credible or not is not binary (but fuzzy) and constantly develops. A quick introspection will reveal the complexity of the credibility system that we maintain and how it dynamically contributes to our linguistic learning over time.</p>
<p>Therefore, when coming to build a language acquisition model, it seems we should give the issue of source credibility its well deserved attention. Language acquisition in humans is linked to the sources of the data and to our ever-developing attitude towards them. Also, humans tend to acquire their first linguistic knowledge in a credibility pure environment (all sources are 100% credible), so it might be a good start for computer language acquisition as well.</p>
<p>Just a few points regarding this issue to highlight its complexity:<br />
1. Not all sources are identical and a source&#8217;s properties dynamically change over time.<br />
2. A certain input usually carries more weight when it comes from many sources. It seems that credibility is in some way accumulative over sources and time.<br />
3. In some cases, even when the data is cast aside as false due to credibility problem, it still doesn&#8217;t disappear and might be reinstated and admitted into our linguistic system if and when the source&#8217;s credibility will update. However, usually this will happen only if the falseness of the data is ambiguous and the change in credibility happens shortly after the input is gained.</p>
]]></description>
		<wfw:commentRss>http://cs.tinyways.com/credibility-in-language-acquisition/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
