| Page 4 of 5 < > |
Interview With Barney Pell and Ramez Naam About Microsoft?s Powerset Acquisition: Integration By End Of Year
|
Discussion Policy
Comments that include profanity or personal attacks or other inappropriate comments or material will be removed from the site. Additionally, entries that are unsigned or contain "signatures" by someone other than the actual author will be removed. Finally, we will take steps to block users who violate any of our posting standards, terms of use or privacy policies or any other policies governing this site. Please review the full rules governing commentaries and discussions. You are fully responsible for the content that you post.
|
Now if you think about, what could you do if you had a system that could understand language? What if I could read? What if it?s already read everything in the document collection you?re interested in? Whether that?s a smallish collection like Wikipedia, or whether that?s potentially the whole Web? How could that actually help you? Well it could help in many ways. One, is you could just use more natural queries, just stating your intent as you actually mean it. Where that?s a full sentence or a question, or just a little bit of a linguistic phrase, or just some persons name. But it could understand that better and it could figure out what you want to do with this and how can I help. And then on the content side, if it could really read, then it could do a much better job matching the meaning of your query to the actual meaning that?s there in the documents. Moreover, it could present for you the results, you often have a challenge when you?re looking at search results of you see a little bit of a snippet kind of two lines worth of characters and you have to figure out from that, is that what I actually wanted? Because the system we have today don?t actually understand the queries and they actually don?t understand the documents, all they can really show you is where the keywords you asked are matched approximately in the right regions. But if they actually could understand both the documents and your query then they could present results, first of all, better two lines, or potentially a whole new kind of presentation.
MA: Just to cut in for one second the way you have described this before I have heard you talk about this is Google and other search engines look for key word batches and then present results ranked according to some sort of algorithm that determines how important a page is. You?ve said before that what Powerset does is it pre-reads the content. It uses artificial intelligence to actually try to understand what sentences mean and in the live search blog post today, the Microsoft announcement effectively of the deal they talked about a couple of examples that you know, a shrub and a tree are similar concepts that was one example, or that the word cancer could mean a disease or a horoscope. How does¿ Ramez maybe you want to jump in here to. How does that actually happen and what? a computer receives a sentence, your server sees a sentence, how does it actually start to parse that, again as non-technical as you can describe it.
BP: Okay, I?ll take that and Ramez, you can jump in on your examples.
RN: Great
BP: I guess one way to think about it is like when you are learning how to diagram sentences in elementary school. You draw these trees of a sentence and find here is the noun phrase and a noun phrase has a determiner like ?the? and then it has a noun like ?dog? and here is a verb phrase, and it might have a verb like ?barks? and then what does it mean for the that word, bark is a verb and it has a ?S? at the end and the way that it works, which we call morphology, that?s the present tense of that verb. And then the whole sentence is composed of those pieces, and so the meaning is built out of those. So you draw these diagrams when you are learning how to do it. And the kind of knowledge that?s in a natural language processing system like Powerset is using is sort of like that. Its basically extracting out both the surface structure, that kind of a tree structure of a sentence, and then its converting that into a series of different representation, ultimately into one which expressing that thing in fact. So it will basically say that there is a kind of activity here and it is a barking activity and the thing that is doing that activity, the subject of that activity, is a dog. Ok. So it is going from that sort of a surface structure of the language that you are seeing and converting it into a semantic factor representation. In addition, it is then able to draw on the individual meaning and relationships between words so if you saw that the sentence said ?The poodle barks.? Then the system knows, if it can draw upon other knowledge about the relationship between words, as Powerset does, that poodles are a kind of dog. So if you as the user were able to say, ?I want dogs barking? then it can actually then match the concept of dog to the concept of poodle and it is matching barking to barking and it is then doing this sort of semantic match for you which uses words you are not even using in your query and matching those against the document.
RN: I think everything that Barney said was right on. I think you see search engines including Live Search and also Google and Yahoo are starting to do more work on this matching not exactly what the user entered but it is usually limited to very simple things. So now all of us do some expansion of abbreviations or expansion of acronyms. If you type ?NYC? in a search engine these days, in the last couple years, it understands that it means the same thing as New York. These are very very simple rules based things, and no one understands that bark has one meaning if it about a tree and a different meaning if it is about a dog. Or an example that someone gave the other day was the question of ?was so and so framed.? And framed could mean a framed picture or it could mean set-up for criminal activity that did not occur, and so on. And you have to actually understand something of it is a person?s name then it applies to one sense of the word framed if it is not then it doesn?t. So one of the things that Powerset brings that is unique is the ability to apply their search technology to the query to the user?s search in ways that are beyond just the simple pluralization or adding an ?-ing? is that Powerset also looks at the document, it looks at the words that are on a web page and this is actually very important. If you look at just the users query, what you have available to you to figure what they are talking about are three words four words five words, maybe even less. That can give you certain hints. If you look at a web page that has hundreds or thousands of words on it you have a lot more information you can use if you understand it linguistically to tell what its about, what kind of quieries it should match and what kind of quieries it shouldn?t match. And Powerset is fairly unique in applying this technology in the index on a fairly large scale already and with Microsoft?s investment and long term commitment we can scale this out even further, an apply it even more of the web, not just the wikipedia content they have thus far.
MA: Ramez how much work has Microsoft done in this area before today? Is it something that has been simmering, that you guys have been interested in, do you have a number of people on staff that are experts in this area, that have built technology around this? it would be interesting to know what you have done to date in this area.
RN: Well Microsoft has some leading people in natural language processing. We have applied the idea in machine translating, translating from one language to another, and in other areas of natural lanaguage, even things like the grammar checker in Microsoft Word comes out of our natural language work in some ways, and that is very exciting. The thing about the Powerset team is that it is purely additive, like the people inside of Microsoft research I have talked to about this are extremely excited. They see the Powerset team in San Francisco as great collaborators and see this as a great chance to exchange data, ideas, tools, and so on. All of this is going to help us directly. Also this is the first time we have had a focused team working just on natural language applied to search specificially, and not a broader area. With this kind of focused effort and the great technology that the Powerset team has built we?ll be able to make really rapid progress.
MA: Where are your search engineers today? Are they in Washington, or in your Mountain View office?
RN: The bulk of our team is in Redmond, and we have a small team that is in Mountain View, as well.
MA: For now is Powerset staying in their San Francisco offices?
RN: Powerset is absolutely staying in San Francisco. They have a fantastic office. I plan on staying down there a couple days a week myself. It is a fantastic location, and we want to grow the team so we are looking for more and more qualified search engineers and more and more computational linguists to join the team at Powerset, and keep scaling up.


![[techcrunch]](http://media.washingtonpost.com/wp-dyn/content/graphic/2008/04/04/GR2008040401977.gif)
