Algorithmic Transparency

Data & ML Methodology

Political Prisoner Watch goes far beyond simple data aggregation. Our platform runs a dedicated Python ML microservice with country-specific models and 50+ API endpoints to analyze risk, forecast repression trends, detect coordinated campaigns, and generate legal evidence for both Russia and Belarus.

Countries Tracked

ML Models

API Endpoints

Model Artifacts

Complete Pipeline

From Raw Data to Actionable Intelligence

Every case enters a multi-stage pipeline. Click “How It Works” on any stage to learn more about the techniques involved.

Stage 01

Automated Data Ingestion

Real-Time Synchronization

Our system continuously ingests data from verified human rights sources. Automated synchronization scripts keep our PostgreSQL database current with the latest arrests, prosecutions, sentencing, and prisoner locations globally.

Primary sources: OVD-Info, Memorial (Russia), and Viasna (Belarus)
Secondary sources: Memorial Human Rights Center, court record aggregators, direct family/legal counsel reports
Structured database with normalized schema for prisoners, cases, articles, locations, and outcomes
Automated sync runs continuously to detect new arrests, status changes, and releases
Geolocation enrichment for coordinate standardization and region mapping
Native-to-English machine translation (Russian & Belarusian) for international accessibility

Stage 02

Named Entity Recognition

Legal Actor Extraction

We apply custom NER models to extract structured legal entities from unstructured case summaries — identifying judges, prosecutors, investigators, lawyers, and courts involved in each case, enabling powerful analytics on the judicial system itself.

Custom NER models trained on multi-lingual legal text to extract: judges, prosecutors, lawyers, courts, and investigators
Entity extraction runs on all case summaries and stores results for downstream analytics
Judge analytics computes per-judge statistics: average sentence length, case count, deviation from global average, and harshness classification
Enables identification of systematically harsh judges and patterns of judicial behavior

Stage 03

XGBoost Risk Classification

Urgency & Torture Prediction

Our core ML models use XGBoost gradient-boosted trees to predict two critical risk probabilities for each case: the urgency level (whether immediate advocacy action is required) and the risk of torture while in custody. We train completely separate models for Russia and Belarus to capture distinct legal and repressive patterns.

Gradient-boosted tree classifiers with logistic objective, trained separately for Russia and Belarus
Feature inputs: age, gender, arrest location, case category, and criminal articles (multi-label encoded)
Preprocessing pipeline: median imputation for numerics, one-hot encoding for categoricals with unknown-category handling
Class imbalance correction for minority outcomes (e.g., torture)
Trained on historical outcomes from thousands of documented cases across both countries
Outputs: urgency probability (0–1), torture probability (0–1), and feature importance rankings

Stage 04

Surveillance Tech Attribution

How They Were Caught

A dedicated classifier predicts the likely surveillance technology used to identify and detain a political prisoner — revealing patterns in state surveillance infrastructure across regions and case types.

Classifier trained on cases with known surveillance methods
Features: arrest location, case category, and temporal features
Predicts categories like: social media monitoring, CCTV/facial recognition, informant reports, phone interception, etc.
Exposes regional surveillance deployment patterns (e.g., Moscow facial recognition vs. regional signals intelligence)

Stage 05

Charge Inflation Detection

Gap Between Act & Charge

This model detects prosecutorial overreach by measuring the gap between what a person actually did (based on case summaries) and the charges filed against them. We maintain distinct severity mappings for Russian Criminal Code and Belarusian Criminal Code articles.

Proprietary severity mapping: Criminal articles scored on a 1–10 scale across both Russian and Belarusian legal frameworks
AI-powered summarization analyzes the description of the actual act committed
Inflation Score = charged severity − actual severity, normalized to a 0–100 scale
Leaderboard ranks the most inflated cases for each respective country framework
Example: Person posts anti-war Instagram story (actual severity ~2) but charged under Art. 207.3 "Fakes about army" (severity 8) → inflation score ≈ 75%

Stage 06

Outcome Prediction

Expected Sentence Forecasting

Given a prisoner's case features, this model predicts the likely sentence length. It uses log-transformed regression to guarantee positive predictions and provides confidence intervals, enabling legal professionals to anticipate outcomes.

Log-transformed regression model ensuring all predictions are positive (sentence months)
Features: criminal articles, case category, gender, location, age, and charge severity
Outputs: predicted sentence length, confidence interval, and comparison to average for similar cases
Anomaly detection layer flags cases where actual sentences deviate significantly from predicted — indicating potential political interference
Outlier detection identifies disproportionate punishments across the database

Stage 07

Prophet Trend Forecasting

90-Day Arrest Predictions

We use Facebook Prophet time-series models to forecast arrest trends up to 90 days into the future — generating isolated models for global trends, Russia-specific trends, and Belarus-specific trends.

Time-series models trained on weekly arrest counts with automatic changepoint detection
Isolated predictive models for Russia vs. Belarus to prevent cross-contamination of historical trends
Article-specific models for charges most commonly used in political persecution
Repression wave detection via statistical anomaly on rolling windows — alerts when a location or article shows unusual spikes

Stage 08

Topic & Campaign Detection

LDA + BERTopic Semantic Analysis

Dual topic modeling identifies thematic clusters across cases — from anti-war speech patterns to religious persecution campaigns. This reveals coordinated repression strategies invisible in individual case analysis.

Statistical topic modeling discovers latent thematic clusters from case text
Transformer-based semantic model captures nuanced meaning beyond simple keyword matching
Prosecution template detection finds groups of cases with suspiciously similar summary text, indicating copy-paste or template-based prosecutions
Monthly topic tracking reveals how repression focus shifts over time (e.g., war-related charges spike after 2022)

Stage 09

Case Network Analysis

Graph Detection of Coordinated Repression

We build similarity networks linking cases by shared charges, location, timing, and tactics for each supported country. Community detection algorithms then identify clusters that reveal coordinated repression campaigns — groups of people targeted simultaneously.

Network graphs built individually for Russian and Belarusian prisoners to map distinct domestic networks
Weighted edge scoring combines all similarity dimensions into a single 0–1 score
Community detection algorithms identify tightly-connected case clusters
Each community is profiled: dominant charges, geographic center, time span, and a descriptive label
Enables visualization and discovery of coordinated repression campaigns

Stage 10

Legal Evidence & Asylum Tools

AI-Powered Legal Support

Our most impactful outputs: statistical persecution evidence, personal risk scores for asylum seekers, comparative case finding, and generative document drafting — all designed to support real legal proceedings.

Persecution Evidence Score: computes statistical proof of pattern-based persecution (by category, location, or article) with confidence intervals and comparison to baselines
Personal Risk Score: 0–100% individualized risk assessment for asylum applicants based on their specific profile
Comparative Case Finder: identifies the most similar documented cases for precedent-building, ranked by multi-dimensional similarity
Affidavit Generator: AI-powered synthesis of prisoner data and country condition reports into legal support documents for asylum cases
All tools designed to produce court-admissible statistical evidence

Architecture

System Architecture

Data Sources

OVD-Info API

Memorial HRC

Court Records

Direct Reports

Backend (Node.js)

PostgreSQL DB

REST API

Data Sync Jobs

Auth / Admin

ML Microservice (Python)

XGBoost Models

Prophet Forecasts

NER & BERTopic

Risk & Outcome

Frontend (Next.js)

Interactive Map

Analytics Dashboard

Legal Tools

Risk Radar

A Note on Predictive Models

Our risk scores and forecasts are probabilistic tools derived from historical data. They are designed to aid researchers and legal professionals in prioritization, not to replace human judgment. A “High Risk” score indicates a statistical resemblance to past cases involving torture or harsh sentencing, but specific outcomes may vary. All models are retrained as new data becomes available.

Transparency & Criteria

How we define political prisoners

VIEW CRITERIA

See the Data in Action

Explore predictive dashboards

VIEW FORECASTS