TF/IDF-based Text Mining Using Python

Rate this item
(0 votes)

The IT Department through its Staff Development Committee conducted a webinar entitled, “TF/IDF-based Text Mining Using Python” in 10th November, 2022 at 12PM and was delivered by Mr Mangesh Wanjari, an IT Lecturer. Before the start of the presentation, Ms Roselle, member of the SDC introduced the speaker and thereafter Mr Mangesh started his talk by highlighting the agenda of the presentation. The presenter first defined the technical term used on this presentation – TF/IDF; he emphasized TF (term frequency) as capturing the relative frequency of any term or word in any given document against the total number of words in a document. After the definition, Mr Mangesh gave sample case which is considered one of the most popular ways of doing the feature selection of text finding. The speaker also presented the applications of TF/IDF scores such as clustering of classification of documents, topic extraction / keywords finding from articles, unifying the data received multiple data sources, text mining, spam filtering, search engines and information retrieval. As to the pros and cons of TF/IDF, it is said that TF/IDF is easy to compute, provides basic metric to extract the most descriptive terms in a document and you can easily compute the similarity between 2 documents, however, some of the disadvantages of TF/IDF is that, TF/IDF is based on the bag-of-words (BoW) model, therefore it does not capture the position in the text, semantics, co-occurrences in different documents and for this reason, IT/IDF is only useful as a lexical-level feature, and it cannot capture semantics. Mr Mangesh has demonstrated the application of TF/IDF using python after the basic understanding of the topic. After the practical demonstration, Ms Roselle, the moderator opened the Question and Answer session where some attendees raised their queries. Afterwhich, feedback link is sent where participants can rate the presenter in various parameters. As a result, the presenter received a score of 4.35 on a scale of 1-5, where 5 is the highest. Twenty-five (25) participants have attended and benefited the program.

Read 104 times
Wednesday, 14 December 2022 00:00 Written by  IT Department In IT
Login to post comments