WIKIPEDIA API and ChatBots

I have been trying to use ontological bases in my chatbot to introduce interactivity which can take the conversation forward.

What I have till now :


  1. Any user query is parsed by an NLP engine to classify if the intent is any one of the preset intents. Some of these preset intents are greetings, compliments, tasks assigned in the Talentify for Business app amongst a few.
  2. If none of the intents are matched then the user query is passed in sequence to various search engines and their web results are scraped to get a metadata response generated by the search engine, which usually is a representation of the internal knowledge graph of the search engine implementation.
  3. If none of the search engines can identify that query, then the query is passed to a knowledge engine called wolfram alpha which in most of the cases can give response to even very complex queries like "what is the distance between Chennai and Bangalore in kilometres"
  4. If even that fails then a default random message is generated (quirky movie dialogues for now)
Now, the problem statement is to introduce some interactivity like showing relevant buttons related to the identified query asked by the user. This can be done quite easily using wikipedia API. The steps are as follows:
  1. The response which comes back from the search engines is passed to an NER engine ( I have chosen Parallel Dots URL for this purpose )
  2. These NERs are now passed to the wikipedia search API URL  to get a list of pageIDS which match this searched named entity 
  3. Next the list of links present in the page corresponding to the NER is passed to this API and voila we get all the links present in the page corresponding to the NER 
  4. Another way is to directly pass the NER to the wikipedia to get links, however this is not very accurate as there can be a lot of NERs which may not be matched exactly by name, so its always better to first search wiki for pageIDs. The URL for this direct recognition.
However, there is a catch here. For a typical query like Michael Jordan wiki can give upto 1500 results which makes is useless to use all of them to introduce interactivity. But we can always pick 5 random among them to come up with atleast something. Google has a much better indexing of information and can come up with entities that are more relevant to the query. But Google being Google doesn't expose this through any API. The only way to see this in action is to use OK Google / long press the middle icon and speak to google, there is no API.

Comments

Popular posts from this blog

Using cookies with HttpURLConnection

SPARQL

ffmpeg for Google Speech API