Data & ML Methodology
Political Prisoner Watch goes far beyond simple data aggregation. Our platform runs a dedicated Python ML microservice with country-specific models and 50+ API endpoints to analyze risk, forecast repression trends, detect coordinated campaigns, and generate legal evidence for both Russia and Belarus.
0
Countries Tracked
0+
ML Models
0+
API Endpoints
0+
Model Artifacts
From Raw Data to Actionable Intelligence
Every case enters a multi-stage pipeline. Click “How It Works” on any stage to learn more about the techniques involved.
Automated Data Ingestion
Real-Time Synchronization
Our system continuously ingests data from verified human rights sources. Automated synchronization scripts keep our PostgreSQL database current with the latest arrests, prosecutions, sentencing, and prisoner locations globally.
- Primary sources: OVD-Info, Memorial (Russia), and Viasna (Belarus)
- Secondary sources: Memorial Human Rights Center, court record aggregators, direct family/legal counsel reports
- Structured database with normalized schema for prisoners, cases, articles, locations, and outcomes
- Automated sync runs continuously to detect new arrests, status changes, and releases
- Geolocation enrichment for coordinate standardization and region mapping
- Native-to-English machine translation (Russian & Belarusian) for international accessibility
Named Entity Recognition
Legal Actor Extraction
We apply custom NER models to extract structured legal entities from unstructured case summaries — identifying judges, prosecutors, investigators, lawyers, and courts involved in each case, enabling powerful analytics on the judicial system itself.
- Custom NER models trained on multi-lingual legal text to extract: judges, prosecutors, lawyers, courts, and investigators
- Entity extraction runs on all case summaries and stores results for downstream analytics
- Judge analytics computes per-judge statistics: average sentence length, case count, deviation from global average, and harshness classification
- Enables identification of systematically harsh judges and patterns of judicial behavior
XGBoost Risk Classification
Urgency & Torture Prediction
Our core ML models use XGBoost gradient-boosted trees to predict two critical risk probabilities for each case: the urgency level (whether immediate advocacy action is required) and the risk of torture while in custody. We train completely separate models for Russia and Belarus to capture distinct legal and repressive patterns.
- Gradient-boosted tree classifiers with logistic objective, trained separately for Russia and Belarus
- Feature inputs: age, gender, arrest location, case category, and criminal articles (multi-label encoded)
- Preprocessing pipeline: median imputation for numerics, one-hot encoding for categoricals with unknown-category handling
- Class imbalance correction for minority outcomes (e.g., torture)
- Trained on historical outcomes from thousands of documented cases across both countries
- Outputs: urgency probability (0–1), torture probability (0–1), and feature importance rankings
Surveillance Tech Attribution
How They Were Caught
A dedicated classifier predicts the likely surveillance technology used to identify and detain a political prisoner — revealing patterns in state surveillance infrastructure across regions and case types.
- Classifier trained on cases with known surveillance methods
- Features: arrest location, case category, and temporal features
- Predicts categories like: social media monitoring, CCTV/facial recognition, informant reports, phone interception, etc.
- Exposes regional surveillance deployment patterns (e.g., Moscow facial recognition vs. regional signals intelligence)
Charge Inflation Detection
Gap Between Act & Charge
This model detects prosecutorial overreach by measuring the gap between what a person actually did (based on case summaries) and the charges filed against them. We maintain distinct severity mappings for Russian Criminal Code and Belarusian Criminal Code articles.
- Proprietary severity mapping: Criminal articles scored on a 1–10 scale across both Russian and Belarusian legal frameworks
- AI-powered summarization analyzes the description of the actual act committed
- Inflation Score = charged severity − actual severity, normalized to a 0–100 scale
- Leaderboard ranks the most inflated cases for each respective country framework
- Example: Person posts anti-war Instagram story (actual severity ~2) but charged under Art. 207.3 "Fakes about army" (severity 8) → inflation score ≈ 75%
Outcome Prediction
Expected Sentence Forecasting
Given a prisoner's case features, this model predicts the likely sentence length. It uses log-transformed regression to guarantee positive predictions and provides confidence intervals, enabling legal professionals to anticipate outcomes.
- Log-transformed regression model ensuring all predictions are positive (sentence months)
- Features: criminal articles, case category, gender, location, age, and charge severity
- Outputs: predicted sentence length, confidence interval, and comparison to average for similar cases
- Anomaly detection layer flags cases where actual sentences deviate significantly from predicted — indicating potential political interference
- Outlier detection identifies disproportionate punishments across the database
Prophet Trend Forecasting
90-Day Arrest Predictions
We use Facebook Prophet time-series models to forecast arrest trends up to 90 days into the future — generating isolated models for global trends, Russia-specific trends, and Belarus-specific trends.
- Time-series models trained on weekly arrest counts with automatic changepoint detection
- Isolated predictive models for Russia vs. Belarus to prevent cross-contamination of historical trends
- Article-specific models for charges most commonly used in political persecution
- Repression wave detection via statistical anomaly on rolling windows — alerts when a location or article shows unusual spikes
Topic & Campaign Detection
LDA + BERTopic Semantic Analysis
Dual topic modeling identifies thematic clusters across cases — from anti-war speech patterns to religious persecution campaigns. This reveals coordinated repression strategies invisible in individual case analysis.
- Statistical topic modeling discovers latent thematic clusters from case text
- Transformer-based semantic model captures nuanced meaning beyond simple keyword matching
- Prosecution template detection finds groups of cases with suspiciously similar summary text, indicating copy-paste or template-based prosecutions
- Monthly topic tracking reveals how repression focus shifts over time (e.g., war-related charges spike after 2022)
Case Network Analysis
Graph Detection of Coordinated Repression
We build similarity networks linking cases by shared charges, location, timing, and tactics for each supported country. Community detection algorithms then identify clusters that reveal coordinated repression campaigns — groups of people targeted simultaneously.
- Network graphs built individually for Russian and Belarusian prisoners to map distinct domestic networks
- Weighted edge scoring combines all similarity dimensions into a single 0–1 score
- Community detection algorithms identify tightly-connected case clusters
- Each community is profiled: dominant charges, geographic center, time span, and a descriptive label
- Enables visualization and discovery of coordinated repression campaigns
Legal Evidence & Asylum Tools
AI-Powered Legal Support
Our most impactful outputs: statistical persecution evidence, personal risk scores for asylum seekers, comparative case finding, and generative document drafting — all designed to support real legal proceedings.
- Persecution Evidence Score: computes statistical proof of pattern-based persecution (by category, location, or article) with confidence intervals and comparison to baselines
- Personal Risk Score: 0–100% individualized risk assessment for asylum applicants based on their specific profile
- Comparative Case Finder: identifies the most similar documented cases for precedent-building, ranked by multi-dimensional similarity
- Affidavit Generator: AI-powered synthesis of prisoner data and country condition reports into legal support documents for asylum cases
- All tools designed to produce court-admissible statistical evidence
System Architecture
Data Sources
Backend (Node.js)
ML Microservice (Python)
Frontend (Next.js)
A Note on Predictive Models
Our risk scores and forecasts are probabilistic tools derived from historical data. They are designed to aid researchers and legal professionals in prioritization, not to replace human judgment. A “High Risk” score indicates a statistical resemblance to past cases involving torture or harsh sentencing, but specific outcomes may vary. All models are retrained as new data becomes available.
Transparency & Criteria
How we define political prisoners
See the Data in Action
Explore predictive dashboards