CALLHOME Relationship Labels

Interpersonal relationship labels for the CALLHOME English corpus

Project Details

Project Type : Research Dataset
Domain : Conversational Analysis
Technologies : Corpus Annotation, CSV
GitHub : https://github.com/dkaterenchuk/callhome_labels
Stars : 2
License : GPL-3.0

CALLHOME Relationship Labels

A curated dataset of manually annotated interpersonal relationship labels for the CALLHOME English corpus, enabling research in conversational analysis and sociolinguistics.

Overview

The CALLHOME English corpus is a widely-used dataset of telephone conversations between native English speakers. This project provides detailed relationship annotations between conversation participants, offering valuable metadata for linguistic research, social network analysis, and conversational dynamics studies.

Dataset Contents

The repository includes four primary datasets in CSV format:

PRIMARY labels - Binary classification marking relationships as “FRIEND” or “RELATIVE”
SECONDARY labels - Additional contextual information supplementing primary designations
NOTES - Annotator observations and metadata from the labeling process
FULL_LIST - Consolidated dataset combining all three categories above

Research Applications

This labeled dataset enables research in several areas:

Conversational analysis - Study how relationship types affect communication patterns
Sociolinguistics - Analyze language variation based on interpersonal relationships
Discourse analysis - Examine turn-taking, topic selection, and conversational strategies
Social network analysis - Map relationship structures in conversation data
Computational linguistics - Train models for relationship detection and classification

Resources

GitHub Repository: callhome_labels
Research Paper: Included in repository (1081.pdf) detailing annotation methodology and findings
License: GPL-3.0

About the CALLHOME Corpus

The CALLHOME English corpus consists of unscripted telephone conversations between friends and family members. These relationship labels add an important dimension to the corpus, allowing researchers to study how interpersonal dynamics shape conversational behavior.

This dataset contributes to the advancement of research in conversational analysis, sociolinguistics, and computational approaches to understanding human communication.