You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines. The goal of information retrieval is to find all documents relevant for a user query in a collection of documents. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. This has the advantage that the link analysis is performed once and then can be used to rank all subsequent queries. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. The PageRank algorithm assigns a score to each document independent of a specic query. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |