OSINT Investigation Workflow for Journalists

Open-source intelligence investigations have moved from the periphery of journalism to its center. Bellingcat's identification of the Salisbury poisoning suspects, the New York Times Visual Investigations team's reconstruction of the Beirut explosion, and countless local investigative projects have demonstrated that publicly available data — satellite imagery, flight records, corporate filings, social media archives — can produce journalism as rigorous and impactful as traditional source-based reporting. But OSINT investigations require a structured methodology to produce evidence that withstands scrutiny. This guide presents a six-stage workflow grounded in the Berkeley Protocol on Digital Open Source Investigations, the standard adopted by the UN Human Rights Office.

Stage 1: Hypothesis Formation

Every investigation starts with a question, not a dataset. Before touching any tool, articulate what you are trying to determine and what evidence would confirm or refute it. This is not academic formality — it prevents the most common failure mode of OSINT investigations: pattern-matching noise into a narrative.

Write your hypothesis in falsifiable terms: "Company X discharged pollutants above permit levels into River Y between March and June 2025" is testable. "Company X is polluting" is not specific enough to investigate rigorously. A good hypothesis identifies the actor, the action, the timeframe, and the geographic scope.

At this stage, conduct a preliminary survey: spend 30-60 minutes searching public records, news archives, and social media to determine whether the hypothesis is plausible and whether sufficient open-source data exists to test it. Many investigations die here — and that is fine. It is better to abandon an uninvestigable hypothesis early than to spend weeks collecting data that cannot answer the question.

Tools for this stage: Google Advanced Search (date-restricted), LexisNexis/ProQuest for news archives, SEC EDGAR for corporate filings, PACER for court records, Google Scholar for prior research on the topic.

Stage 2: Collection

Once you have a viable hypothesis, systematic data collection begins. The Berkeley Protocol emphasizes two principles: collect broadly (capture more than you think you need, because sources disappear) and preserve everything (maintain original copies with metadata intact).

For each piece of evidence, record the following metadata at the time of collection:

Collection Record:
  Source URL:     https://example.com/document.pdf
  Access Date:    2026-02-07T14:23:00Z
  SHA-256 Hash:   a7f9e3b2c4d8...
  Archive URL:    https://web.archive.org/web/20260207/...
  Collector:      Jane Reporter
  Method:         Direct download from public portal
  Notes:          Found via FOIA request #2025-1234

Hash every file immediately upon download using SHA-256. This creates a cryptographic fingerprint that proves the file has not been altered since collection. If the evidence is later challenged, the hash provides a verifiable chain of integrity.

For web content that may be deleted, use the Wayback Machine's "Save Page Now" feature or archive.today to create independent copies. For social media posts, use specialized archiving tools that capture the post, metadata (timestamp, account details, engagement metrics), and surrounding context.

Tools for this stage: Hunchly (browser-based evidence capture), Wayback Machine, archive.today, youtube-dl/yt-dlp for video preservation, ExifTool for metadata extraction, shasum/sha256sum for hashing. Deep Seer provides built-in evidence capture with automatic SHA-256 hashing and timestamping for geospatial data.

Stage 3: Verification

Verification is what separates journalism from rumor aggregation. Every piece of evidence must be tested for authenticity before it enters your analysis. The four dimensions of verification are:

Provenance: Where did this data come from? Can you trace it to a primary source? Government databases (FEC, EPA, SEC) are high-provenance sources. A screenshot shared on social media with no source attribution is low-provenance until verified independently.

Integrity: Has the data been altered? For images, reverse image search (Google, TinEye, Yandex) can find the earliest known version. EXIF data provides camera model, timestamp, and sometimes GPS coordinates, but EXIF is trivially editable. For documents, metadata analysis can reveal editing history. For satellite imagery, compare against multiple providers (Sentinel-2, Planet, Maxar) to confirm consistency.

Temporal accuracy: When was this data created? Timestamps in digital files can be forged, so corroborate with independent time indicators. For outdoor photos, shadow analysis can estimate the time of day (see our guide on shadow-based geolocation). For video, background audio, weather conditions visible in the frame, and the positions of celestial objects can provide temporal anchors.

Geographic accuracy: Where does this data represent? For geolocated content, verify that the claimed location matches visual features in the image or video. Use Google Earth, Mapillary, or street-level imagery to confirm landmarks, terrain, vegetation, and infrastructure.

Stage 4: Analysis

With verified data in hand, analysis involves finding patterns, connections, and anomalies that address your hypothesis. This is the stage where OSINT tools provide their greatest leverage.

Geospatial analysis: Plotting data points on a map reveals spatial patterns invisible in tabular data. Facility locations relative to residential areas, flight paths relative to borders, vessel tracks relative to sanctioned ports — geography is often the dimension that connects otherwise unrelated data points.

Temporal analysis: Timeline construction shows sequences of events. When did the permit violation occur relative to the lobbying expenditure? Did the flight pattern change before or after the diplomatic meeting? Time-series analysis of ADS-B data, AIS tracks, or satellite imagery revisits can reveal behavioral changes that correlate with known events.

Network analysis: Mapping relationships between entities — corporate ownership structures, donor networks, co-travel patterns — reveals connections that individual records do not. A company with a clean public image may share directors with a firm under sanctions. A politician's top donors may all be executives at subsidiaries of the same parent corporation.

Tools for this stage: Deep Seer for geospatial/temporal overlay of multiple data types, Maltego for network graphing, i2 Analyst's Notebook for timeline analysis, QGIS for custom geospatial analysis, Python/pandas for large-scale data processing, Gephi for network visualization.

Stage 5: Corroboration

The Berkeley Protocol requires that no conclusion rest on a single piece of evidence. Corroboration means finding independent evidence that confirms or refutes what your analysis suggests. This is the discipline that distinguishes rigorous investigation from speculation.

Independent means truly independent — not two social media posts from the same event (which may share a source), but a satellite image that confirms what a ground-level photo shows, or a financial record that confirms what a human source described, or a shipping manifest that matches an AIS track.

Apply the two-source minimum rule for any factual claim in your publication. For extraordinary claims, require three or more independent sources. Document the corroboration chain explicitly in your working notes:

Claim: Facility X discharged into River Y on May 12
Evidence 1: Sentinel-2 satellite imagery showing discolored
            water plume at discharge point (captured May 12, 14:20 UTC)
Evidence 2: EPA ECHO inspection report dated May 15 noting
            "unauthorized discharge" (NPDES permit #TX0012345)
Evidence 3: Downstream USGS water quality station showing
            ammonia spike at 18:00 on May 12
Corroboration: Strong — three independent sources confirm
               the event via different collection methods

When evidence conflicts, do not discard the outlier without investigation. Conflicting evidence often indicates that your hypothesis needs refinement, not that one source is wrong.

Stage 6: Publication and Source Protection

Before publication, conduct a harm assessment. OSINT investigations can inadvertently endanger sources, subjects, or bystanders. Satellite imagery of a refugee camp at high resolution could help hostile actors target it. Publishing the home address of a whistleblower you identified through corporate records could put them at risk. The fact that information is publicly available does not mean republishing it is always ethical.

For source protection, strip metadata from any files you publish. EXIF data in photos can contain GPS coordinates, camera serial numbers, and software version strings that identify the device and location. Use exiftool -all= photo.jpg to remove all metadata, or use a purpose-built tool like MAT2 (Metadata Anonymisation Toolkit).

Structure your publication to show your work. Readers — and subjects of the investigation — should be able to follow your reasoning and verify your evidence independently. Link to primary sources whenever possible. Describe your methodology explicitly. This transparency is both an ethical obligation and a practical defense against legal challenges.

Finally, preserve your evidence archive indefinitely. Investigations sometimes become relevant again years later, when new events provide additional context or when legal proceedings require the underlying data. A well-organized, hashed, and timestamped evidence archive is a professional asset that appreciates over time.

Try it in Deep Seer

Deep Seer provides a complete investigation workflow with built-in evidence capture, SHA-256 hashing, chain-of-custody tracking, multi-source geospatial analysis, and legal-grade export formats.

Launch Deep Seer