fontr.pipelines.bcf_preprocessing package

Complete Data Processing pipeline for the adobe dataset

Submodules

fontr.pipelines.bcf_preprocessing.nodes module

read_bcf_metadata(bcf_file)[source]

Reads metadata of bcf file using the potiner given as the argument.

The .bcf format looks as follows:
  • 8 bytes - n number of the .png files in the .bcf file.

  • 8n - size of each .png file.

  • n - .png files stored as raw bytes.

Parameters:

bcf_file (fsspec.core.OpenFile) – File descriptior to the .bcf file.

Returns:

File descriptor to the .bcf file. Read sizes of the .png files.

Return type:

tuple[fsspec.core.OpenFile, np.ndarray]

read_labels(label_file)[source]

Stores reads labels saved under label_file and converts it into a cvs file

Parameters:

label_file (fsspec.core.OpenFile) – File descriptor to the .label file

Returns:

Read labels as dataframe

Return type:

pd.DataFrame

upload_bcf_as_png(bcf_file, file_sizes, output_path)[source]

Stores .png files stored in a .bcf files in a output_path.

Parameters:
  • bcf_file (fsspec.core.OpenFile) – File descriptior to the .bcf file.

  • file_sizes (np.ndarray) – File sizes read in read_bcf_metadata node.

  • output_path (str) – Path where the .png files are stored

Return type:

None

upload_labels_as_csv(df_labels, output_path)[source]

Stores passed df_labels in the output_path.

Parameters:
  • df_labels (pd.DataFrame) – labels

  • output_path (str) – Pathe where the labels.csv file is stored

fontr.pipelines.bcf_preprocessing.pipeline module

create_pipeline(**kwargs)[source]
Return type:

Pipeline