CALLHOME Relationship Labels

Interpersonal relationship labels for the CALLHOME English corpus

Project Details

CALLHOME Relationship Labels

A curated dataset of manually annotated interpersonal relationship labels for the CALLHOME English corpus, enabling research in conversational analysis and sociolinguistics.

Overview

The CALLHOME English corpus is a widely-used dataset of telephone conversations between native English speakers. This project provides detailed relationship annotations between conversation participants, offering valuable metadata for linguistic research, social network analysis, and conversational dynamics studies.

Dataset Contents

The repository includes four primary datasets in CSV format:

  • PRIMARY labels - Binary classification marking relationships as “FRIEND” or “RELATIVE”
  • SECONDARY labels - Additional contextual information supplementing primary designations
  • NOTES - Annotator observations and metadata from the labeling process
  • FULL_LIST - Consolidated dataset combining all three categories above

Research Applications

This labeled dataset enables research in several areas:

  • Conversational analysis - Study how relationship types affect communication patterns
  • Sociolinguistics - Analyze language variation based on interpersonal relationships
  • Discourse analysis - Examine turn-taking, topic selection, and conversational strategies
  • Social network analysis - Map relationship structures in conversation data
  • Computational linguistics - Train models for relationship detection and classification

Resources

  • GitHub Repository: callhome_labels
  • Research Paper: Included in repository (1081.pdf) detailing annotation methodology and findings
  • License: GPL-3.0

About the CALLHOME Corpus

The CALLHOME English corpus consists of unscripted telephone conversations between friends and family members. These relationship labels add an important dimension to the corpus, allowing researchers to study how interpersonal dynamics shape conversational behavior.

This dataset contributes to the advancement of research in conversational analysis, sociolinguistics, and computational approaches to understanding human communication.