src.lemma#
Module Contents#
Classes#
Class representing one lemma in a DWUG-like dataset |
Attributes#
- src.lemma.Group#
- src.lemma.Sample#
- class src.lemma.Lemma#
Bases:
pydantic.BaseModelClass representing one lemma in a DWUG-like dataset (i.e., one of the words represented as folders in the data/ directory)
- property name: str#
The name of the lemma, based on instance’s path
- property uses_df: pandas.DataFrame#
Cached property that collects the corresponding uses.csv files, as well as preprocesses each use based on the provided configuration.
- Returns:
The preprocessed DataFrame of uses for the corresponding lemma
- Return type:
DataFrame
- property uses_schema: pandera.DataFrameSchema#
- property annotated_pairs_df: pandas.DataFrame#
Property that collects the annotated pairs of the corresponding lemma from its judgments.csv file. It performs validation based on
annotated_pairs_schema.- Returns:
A DataFrame containing two columns (identifier1, identifier2)
- Return type:
DataFrame
- property augmented_annotated_pairs_df: pandas.DataFrame#
A version of
annotated_pairs_dfthat incorporates grouping information. The baseannotated_pairs_dfis expanded with the groupings oƒ each of the identifiers in each row.- Returns:
The expanded DataFrame
- Return type:
DataFrame
- property annotated_pairs_schema: pandera.DataFrameSchema#
Schema for validating that a judgments.csv file contains two columns (identifier1, identifier2)
- Returns:
The schema
- Return type:
DataFrameSchema
- property predefined_use_pairs_df: pandas.DataFrame#
- property augmented_predefined_use_pairs_df: pandas.DataFrame#
- groupings: tuple[str, str]#
Each of the DWUG datasets consists of word usages from multiple groups. In most cases, there are only two, which represent time periods. In other datasets, there are more than two, in which case they represent regional variations.
- path: pydantic.DirectoryPath#
The path to the directory containing the corresponding lemma within its dataset. Must be a valid existing directory.
- preprocessing: src.preprocessing.ContextPreprocessor#
A context preprocessing strategy
- _uses_df: pandas.DataFrame#
- _annotated_pairs_df: pandas.DataFrame#
- _augmented_annotated_pairs: pandas.DataFrame#
- _predefined_use_pairs_df: pandas.DataFrame#
- _augmented_predefined_use_pairs_df: pandas.DataFrame#
- _clusters_df: pandas.DataFrame#
- useid_to_grouping() Dict[src.use.UseID, str]#
Method to generate a dictionary from use identifiers to their corresponding groupings
- Returns:
A dictionary from use identifiers to use groupings
- Return type:
Dict[UseID, str]
- grouping_to_useid() dict[str, list[src.use.UseID]]#
Method to generate a dictionary from use groupings to a list of use identifiers corresponding to that grouping
- Returns:
A dictionary from groupings to list of use identifier
- Return type:
dict[str, list[UseID]]
- _split_compare_uses() tuple[list[src.use.UseID], list[src.use.UseID]]#
- _split_earlier_uses() tuple[list[src.use.UseID], list[src.use.UseID]]#
- _split_later_uses() tuple[list[src.use.UseID], list[src.use.UseID]]#
- split_uses(group: Group) tuple[list[src.use.UseID], list[src.use.UseID]]#
Splits the uses of a lemma into two separate lists of use identifiers, according to pairing
- Parameters:
group (Group) – A pairing strategy
- Returns:
_description_
- Return type:
tuple[list[UseID], list[UseID]]
- get_uses() list[src.use.Use]#
- use_pairs(group: Group, sample: Sample) list[tuple[src.use.Use, src.use.Use]]#
- _split_augmented_uses(group: Group, augmented_uses: pandas.DataFrame) tuple[list[src.use.UseID], list[src.use.UseID]]#