Methodology
Data Engineering Framework for Medieval Institutional Analysis
1. Research Objective
This study aims to quantify the institutional, economic, and geopolitical impact of medieval monastic communities across Europe using modern data engineering and ESG analysis frameworks.
The research addresses three questions:
- How did different religious orders (as institutional protocols) influence economic output and territorial development?
- What is the correlation between governance structure and institutional resilience across geopolitical disruptions?
- Can we quantify the environmental and cultural legacy of monastic land management using ESG metrics?
2. Data Acquisition
2.1 Sources
| Source | Type | Content |
|---|---|---|
| Ecclesiastical registers | Archival | Foundation dates, order affiliation, patronage |
| Medieval cartularies | Archival | Land grants, tithes, economic transactions |
| Archaeological surveys | Geospatial | Site coordinates, architectural remains |
| Trade route databases | Geospatial | Medieval commercial corridors |
| Academic publications | Secondary | Institutional history, regional economics |
2.2 Collection Method
Data was collected through a combination of:
- Automated web scraping (Python / BeautifulSoup) targeting digitized archival databases and open-access historical repositories
- LLM-assisted Named Entity Recognition (NER) to extract structured entities (founders, orders, dates, land grants) from unstructured historical text in Latin and Old French
- Manual verification against published academic sources to validate AI-extracted entities
3. Data Normalization
3.1 Standardization
Raw data from heterogeneous sources required normalization across several dimensions:
| Dimension | Method | Tool |
|---|---|---|
| Dates | Conversion to ISO 8601 from Latin calendar references | Python (dateutil) |
| Locations | Geocoding historical place names to modern coordinates | GeoPandas / Nominatim |
| Economic values | Min-Max scaling of tithes and land grants | Pandas / Scikit-learn |
| Order names | Mapping variant spellings to canonical order names | Custom lookup table |
| Text language | Translation of Latin/Old French excerpts | LLM-assisted |
3.2 Dataset Structure
After normalization, five structured datasets were produced:
4. Analytical Framework
4.1 ESG Scoring
Each community receives a composite score (0–100) across three dimensions:
| Dimension | Weight | Indicators |
|---|---|---|
| Environmental | 35% | Land reclamation, agricultural diversification, water management |
| Social | 35% | Education, hospitality, community employment, cultural production |
| Governance | 30% | Rule adherence, institutional continuity, succession protocols |
4.2 Geopolitical Analysis
Communities are analyzed against:
- Proximity to trade routes (km to nearest documented corridor)
- Border zone classification (frontier vs. interior positioning)
- Political stability index of the surrounding region per century
- Order-level territorial strategy (documented expansion protocols)
4.3 Network Analysis
Sister communities are linked using:
- Same order affiliation within a defined radius (150 km)
- Documented dependency relationships (mother/daughter houses)
- Shared patronage networks (same founding family or bishop)
4.4 Temporal Analysis
Economic and institutional data is segmented into century-level periods to identify:
- Growth phases correlated with political stability
- Decline phases correlated with geopolitical disruption
- Resilience patterns linked to governance protocols
5. Reproducibility
All code is available in the project repository:
src/scraping.py— Data acquisition pipelinesrc/ner_extraction.py— AI entity extractionsrc/normalization.py— Cleaning and scalingsrc/esg_scoring.py— ESG dimension calculationssrc/network.py— Sister community proximitysrc/viz.py— Visualization functions
Raw data is excluded from the repository (.gitignore) to respect archival licensing. Processed datasets are included in data/processed/ for full reproducibility of the analysis.