Event box

Mining Microhistories: How to Extract Data from Historical Print Sources [In-person]

Mining Microhistories: How to Extract Data from Historical Print Sources [In-person] Online

Ever wondered how historians uncover the lived experiences behind census statistics or reconstruct individual stories from fragmentary historical records? The census gives us bare bones - age, occupation, birthplace - but how do we recover the human experiences behind those numbers and turn them into workable datasets? Learn advanced research techniques that transform scattered mentions in old newspapers, government surveys, and business records into rich, structured data about ordinary people's lives.

Join our history librarian for a deep dive into extracting personal data from historical print sources. The census tells us John Smith was a 35-year-old laborer, but what about his working conditions, family struggles, or daily wages? In this hands-on workshop, you'll discover systematic approaches to mining Pittsburgh-area newspapers, labor surveys, and business directories to illuminate stories often invisible in traditional archives. We'll explore sources like the Dillingham Commission's investigation of immigrants in industry, the Middletown studies' detailed community surveys, local newspaper employment notices, and municipal records to demonstrate how granular data can move far beyond census limitations to reconstruct lived experiences.

You'll master techniques for handling both digitized and analog materials - from optimizing OCR workflows to designing efficient manual transcription systems. We'll work through real methodological challenges: linking individuals across different sources despite variant spellings, standardizing occupational categories that changed over decades, and building databases that preserve historical uncertainty while still enabling meaningful analysis.

What - This hands-on workshop covers:

  • Navigating newspaper databases and government document collections using targeted search strategies
  • Assessing OCR quality and choosing appropriate text extraction methods
  • Designing data structures that capture historical ambiguity and source provenance
  • Creating systematic workflows for manual transcription and data entry
  • Linking individuals across multiple source types despite inconsistent information
  • Applying ethical frameworks when working with personal historical data

Why - Participants will:

  • Develop skills for extracting structured data from unstructured historical sources
  • Build methodological approaches that scale from individual research projects to collaborative databases
  • Master techniques applicable across different historical periods and geographic regions

Who - Graduate students and faculty working at in history, digital humanities, and social sciences, particularly those researching labor history, migration patterns, urban development, or any topic requiring reconstruction of individual experiences from historical print sources.

How - One-hour hybrid workshop combining methodology overview with hands-on practice using local-area historical sources. Both in-person and remote participants work with sample materials in guided exercises.

Activity - Create a data extraction workflow and sample database structure for a local historical source, with documentation for replication and expansion

When - Tuesday November 4th, 3:30pm-4:30pm
Where - Hunt Library 308

Date:
Wednesday, November 5, 2025
Time:
3:30pm - 4:30pm
Time Zone:
Eastern Time - US & Canada (change)
Location:
Hunt Library - Room 308
Campus:
Hunt Library
Audience:
  Faculty     Staff     Students  
Categories:
  Digital Humanities     Information & Data Literacy CC     Working With Data and Code  
Online:
This is a virtual event. A URL to participate will be sent via a reminder email 24 hours before the event.

Registration is required. There are 29 in-person seats available. There are 2 online seats available.

Directions for getting to Hunt Library, Room 308:

Take the stairs or elevators to the 3rd floor and turn left. You'll find Room 308 around the corner from the water fountain across from the Women's restroom.

Automated notetaking tools and recordings not initiated by the instructor are generally not permitted in workshops. However, participants may request permission to use approved tools, such as Zoom AI Companion, for accommodation purposes. To ensure appropriate arrangements, please submit these requests to instructors at least 24 hours prior to the workshop.

Workshops and events for Carnegie Mellon University Libraries are open to all, regardless of race, color, national origin, sex, disability, age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status or genetic information. All participants are required to follow the Code of Conduct.

If you require accessibility accommodations, please contact the event organizer.

Organizer(s)

Profile photo of Charlotte Kiger Price
Charlotte Kiger Price

About & Contact Info

Humanities & Social Sciences Librarian

charlotteprice@cmu.edu