Duc Phat Nguyen

A student looking for opportunities in Data Science & AI

Project Privacy Policies Thumbnail

Privacy Policies AI System

An AI System for Classifying, Evaluating and Interpreting Privacy Policies and Terms of Conditions.

View on GitHub

Built With

Python Badge NumPy Badge pandas Badge scikit-learn Badge Keras Badge Flask Badge Streamlit Badge

Privacy policies and terms of conditions are crucial documents that outline how companies handle user data. However, these documents are often lengthy, complex, and difficult for users to understand. This project aimed to develop an AI-powered web application that assists users in evaluating policy documents for acceptability, identifying potential issues, and generating concise summaries.

The project involved four main tasks:

  1. Develop a versatile web application that leverages natural language processing (NLP) and machine learning techniques to classify and evaluate privacy policies and terms of conditions.
  2. Highlight problematic phrases within the documents to help users identify potential issues.
  3. Generate concise summaries of the documents using automated text summarization techniques.
  4. Provide a user-friendly interface for accessing these functionalities, making the tool useful for content moderation, information retrieval, and document summarization.

Visualisation

Main Page

Main Page

Result Page

Result Page

Main Page variation on Streamlit

Main Page variation on Streamlit

Workflow

  1. Data Preparation:
    • Preprocessed text data by normalizing case, removing stopwords, tokenizing, and lemmatizing.
    • Utilized Word2Vec, trained on Google News data, to transform text into numerical vectors, capturing semantic word relationships.
  2. Machine Learning Models:
    • Implemented various models, including Decision Tree, Random Forest, Support Vector Machine (SVM), Logistic Regression, and Convolutional Neural Network (CNN), for text classification.
    • Conducted hyperparameter tuning to enhance model performance.
  3. Problematic Phrases Identification:
    • Developed a mechanism to highlight problematic phrases in text using HTML tags with a red background.
  4. Summarization:
    • Employed the BART model from HuggingFace for automated text summarization, generating concise summaries of the documents.
  5. Web Application Development:
    • Created a user-friendly web application that integrates the NLP and machine learning models, providing an intuitive interface for users to input policy documents and receive evaluations, highlighted issues, and summaries.

Results

The Privacy Policies AI System project successfully delivered a versatile web application that assists users in evaluating privacy policies and terms of conditions. Key results include:

Visit the GitHub Repository

For the full code and documentation, visit the GitHub repository.