LIVE
Loading latest updates...
Algorithmic Transparency

Data & ML Methodology

Political Prisoner Watch goes far beyond simple data aggregation. Our platform runs a dedicated Python ML microservice with country-specific models and 50+ API endpoints to analyze risk, forecast repression trends, detect coordinated campaigns, and generate legal evidence for both Russia and Belarus.

0

Countries Tracked

0+

ML Models

0+

API Endpoints

0+

Model Artifacts

Complete Pipeline

From Raw Data to Actionable Intelligence

Every case enters a multi-stage pipeline. Click “How It Works” on any stage to learn more about the techniques involved.

Stage 01

Automated Data Ingestion

Real-Time Synchronization

Our system continuously ingests data from verified human rights sources. Automated synchronization scripts keep our PostgreSQL database current with the latest arrests, prosecutions, sentencing, and prisoner locations globally.

  • Primary sources: OVD-Info, Memorial (Russia), and Viasna (Belarus)
  • Secondary sources: Memorial Human Rights Center, court record aggregators, direct family/legal counsel reports
  • Structured database with normalized schema for prisoners, cases, articles, locations, and outcomes
  • Automated sync runs continuously to detect new arrests, status changes, and releases
  • Geolocation enrichment for coordinate standardization and region mapping
  • Native-to-English machine translation (Russian & Belarusian) for international accessibility
Stage 02

Named Entity Recognition

Legal Actor Extraction

We apply custom NER models to extract structured legal entities from unstructured case summaries — identifying judges, prosecutors, investigators, lawyers, and courts involved in each case, enabling powerful analytics on the judicial system itself.

  • Custom NER models trained on multi-lingual legal text to extract: judges, prosecutors, lawyers, courts, and investigators
  • Entity extraction runs on all case summaries and stores results for downstream analytics
  • Judge analytics computes per-judge statistics: average sentence length, case count, deviation from global average, and harshness classification
  • Enables identification of systematically harsh judges and patterns of judicial behavior
Stage 03

XGBoost Risk Classification

Urgency & Torture Prediction

Our core ML models use XGBoost gradient-boosted trees to predict two critical risk probabilities for each case: the urgency level (whether immediate advocacy action is required) and the risk of torture while in custody. We train completely separate models for Russia and Belarus to capture distinct legal and repressive patterns.

  • Gradient-boosted tree classifiers with logistic objective, trained separately for Russia and Belarus
  • Feature inputs: age, gender, arrest location, case category, and criminal articles (multi-label encoded)
  • Preprocessing pipeline: median imputation for numerics, one-hot encoding for categoricals with unknown-category handling
  • Class imbalance correction for minority outcomes (e.g., torture)
  • Trained on historical outcomes from thousands of documented cases across both countries
  • Outputs: urgency probability (0–1), torture probability (0–1), and feature importance rankings
Stage 04

Surveillance Tech Attribution

How They Were Caught

A dedicated classifier predicts the likely surveillance technology used to identify and detain a political prisoner — revealing patterns in state surveillance infrastructure across regions and case types.

  • Classifier trained on cases with known surveillance methods
  • Features: arrest location, case category, and temporal features
  • Predicts categories like: social media monitoring, CCTV/facial recognition, informant reports, phone interception, etc.
  • Exposes regional surveillance deployment patterns (e.g., Moscow facial recognition vs. regional signals intelligence)
Stage 05

Charge Inflation Detection

Gap Between Act & Charge

This model detects prosecutorial overreach by measuring the gap between what a person actually did (based on case summaries) and the charges filed against them. We maintain distinct severity mappings for Russian Criminal Code and Belarusian Criminal Code articles.

  • Proprietary severity mapping: Criminal articles scored on a 1–10 scale across both Russian and Belarusian legal frameworks
  • AI-powered summarization analyzes the description of the actual act committed
  • Inflation Score = charged severity − actual severity, normalized to a 0–100 scale
  • Leaderboard ranks the most inflated cases for each respective country framework
  • Example: Person posts anti-war Instagram story (actual severity ~2) but charged under Art. 207.3 "Fakes about army" (severity 8) → inflation score ≈ 75%
Stage 06

Outcome Prediction

Expected Sentence Forecasting

Given a prisoner's case features, this model predicts the likely sentence length. It uses log-transformed regression to guarantee positive predictions and provides confidence intervals, enabling legal professionals to anticipate outcomes.

  • Log-transformed regression model ensuring all predictions are positive (sentence months)
  • Features: criminal articles, case category, gender, location, age, and charge severity
  • Outputs: predicted sentence length, confidence interval, and comparison to average for similar cases
  • Anomaly detection layer flags cases where actual sentences deviate significantly from predicted — indicating potential political interference
  • Outlier detection identifies disproportionate punishments across the database
Stage 07

Prophet Trend Forecasting

90-Day Arrest Predictions

We use Facebook Prophet time-series models to forecast arrest trends up to 90 days into the future — generating isolated models for global trends, Russia-specific trends, and Belarus-specific trends.

  • Time-series models trained on weekly arrest counts with automatic changepoint detection
  • Isolated predictive models for Russia vs. Belarus to prevent cross-contamination of historical trends
  • Article-specific models for charges most commonly used in political persecution
  • Repression wave detection via statistical anomaly on rolling windows — alerts when a location or article shows unusual spikes
Stage 08

Topic & Campaign Detection

LDA + BERTopic Semantic Analysis

Dual topic modeling identifies thematic clusters across cases — from anti-war speech patterns to religious persecution campaigns. This reveals coordinated repression strategies invisible in individual case analysis.

  • Statistical topic modeling discovers latent thematic clusters from case text
  • Transformer-based semantic model captures nuanced meaning beyond simple keyword matching
  • Prosecution template detection finds groups of cases with suspiciously similar summary text, indicating copy-paste or template-based prosecutions
  • Monthly topic tracking reveals how repression focus shifts over time (e.g., war-related charges spike after 2022)
Stage 09

Case Network Analysis

Graph Detection of Coordinated Repression

We build similarity networks linking cases by shared charges, location, timing, and tactics for each supported country. Community detection algorithms then identify clusters that reveal coordinated repression campaigns — groups of people targeted simultaneously.

  • Network graphs built individually for Russian and Belarusian prisoners to map distinct domestic networks
  • Weighted edge scoring combines all similarity dimensions into a single 0–1 score
  • Community detection algorithms identify tightly-connected case clusters
  • Each community is profiled: dominant charges, geographic center, time span, and a descriptive label
  • Enables visualization and discovery of coordinated repression campaigns
Stage 10

Legal Evidence & Asylum Tools

AI-Powered Legal Support

Our most impactful outputs: statistical persecution evidence, personal risk scores for asylum seekers, comparative case finding, and generative document drafting — all designed to support real legal proceedings.

  • Persecution Evidence Score: computes statistical proof of pattern-based persecution (by category, location, or article) with confidence intervals and comparison to baselines
  • Personal Risk Score: 0–100% individualized risk assessment for asylum applicants based on their specific profile
  • Comparative Case Finder: identifies the most similar documented cases for precedent-building, ranked by multi-dimensional similarity
  • Affidavit Generator: AI-powered synthesis of prisoner data and country condition reports into legal support documents for asylum cases
  • All tools designed to produce court-admissible statistical evidence
Architecture

System Architecture

Data Sources

OVD-Info API
Memorial HRC
Court Records
Direct Reports

Backend (Node.js)

PostgreSQL DB
REST API
Data Sync Jobs
Auth / Admin

ML Microservice (Python)

XGBoost Models
Prophet Forecasts
NER & BERTopic
Risk & Outcome

Frontend (Next.js)

Interactive Map
Analytics Dashboard
Legal Tools
Risk Radar

A Note on Predictive Models

Our risk scores and forecasts are probabilistic tools derived from historical data. They are designed to aid researchers and legal professionals in prioritization, not to replace human judgment. A “High Risk” score indicates a statistical resemblance to past cases involving torture or harsh sentencing, but specific outcomes may vary. All models are retrained as new data becomes available.

Transparency & Criteria

How we define political prisoners

VIEW CRITERIA

See the Data in Action

Explore predictive dashboards

VIEW FORECASTS