In many sources of data, relevant information is conveyed by free text: this is the case for instance when analyzing the contents of patient records, scientific publications, social media, etc. Because of the non-formal nature of human language, contrary for instance to programming languages, computer-based extraction of structured information from natural language text is challenged by the high variation in expression and the importance of context for correct interpretation. Natural Language Processing aims to design methods that address these challenges, using human knowledge or data-driven methods. This course aims to bring participants to the level where they can independently perform text classification and extract data from text for further data processing and analysis.

The course provides an introduction to Natural Language Processing, including how to handle language units such as words, phrases, sentences, and additional information such as part-of-speech and syntactic structure. The most common applications of supervised machine learning to text analytics will be introduced, such as text classification, sequence labelling for information extraction, focusing on entity recognition and classification, as well as the creation and use of word embeddings and neural classifiers. The course will take biomedical text as illustration, supported by a short introduction to the representation and processing of biomedical terminology.

Content structure:

  • Introduction to Natural Language Processing
  • Basic Natural Language Processing tools
  • Machine learning for text classification
  • Sequence labelling for information extraction
  • Biomedical terminology for entity recognition
  • Word embeddings and neural classifiers for entity recognition

  • Type of course: This is an on campus course.
  • Dates & times: Monday April 11, Tuesday April 12 and Wednesday April 13, 2022, from 9 am to 12 pm and from 1 pm to 4 pm
  • Venue: UGent, Faculty of Sciences, Campus Sterre, Krijgslaan 281, building S9, 9000 Gent
  • Target audience: This course is aimed at professionals and investigators from diverse areas, who need to analyze information conveyed by texts. It is of particular interest to researchers, graduate students or postdocs in health-related specialities who need to analyze information conveyed by patient records, scientific publications, social media, etc.
  • Exam/certificate: Participants who attend all classes receive a certificate of attendance via e-mail at the end of the course. Additionally, participants can, if they wish, take part in an exam. Upon succeeding in this test a certificate from Ghent University will be issued. The exam consists of a take home project assignment. Students are required to write a report by a set deadline.
  • Course prerequisites: Participants are expected to be familiar with the Python programming language equivalent to Module 4 - Getting Started with Python for Data Scientists and Module 13 - Upgrade your Python Skills: Data Wrangling & Plotting of this year's program. Some knowledge of supervised machine learning is considered a plus.
  • Funding: => Our academy is recognised as a service provider for the 'KMO-portefeuille'. In this way small and middle sized businesses located in the Flanders region can save up to 30% on the registration fee for our courses. You can request this subsidy via up until 14 calender days after the course has started. => UGent PhD students can apply for a full refund from their Doctoral School.
  • Reduction: => If two or more employees from the same company enrol simultaneously for this course a reduction of 20% on the module price is taken into account starting from the second enrolment => Reduced prices apply to coworkers in governmental institutions, non-profit organisations and higher eduction as well as for students and the unemployed.
