pacsifier.cli

The pacsifier.cli subpackage contains multiple modules that define the command line interface (CLI) tools of the pacsifier package.

pacsifier.cli.pacsifier

Script to query, retrieve, and upload DICOM images from / to a PACS server.

pacsifier.cli.pacsifier.check_query_table_allowed_filters(table: <Mock name='mock.DataFrame' id='140068451577392'>, allowed_filters: ~typing.List[str] = ['StudyDate', 'StudyTime', 'SeriesDescription', 'PatientID', 'ProtocolName', 'StudyInstanceUID', 'SeriesInstanceUID', 'PatientName', 'PatientBirthDate', 'DeviceSerialNumber', 'AcquisitionDate', 'Modality', 'ImageType', 'SeriesNumber', 'StudyDescription', 'AccessionNumber', 'SequenceName', 'new_ids']) None

Check if the csv table passed as input has only attributes that are allowed.

Parameters:
  • table – table containing all filters

  • allowed_filters – list of allowed attribute names

pacsifier.cli.pacsifier.get_parser() ArgumentParser

Return the parser object for this script.

pacsifier.cli.pacsifier.main()

Main function of the script that calls retrieve_dicoms_using_table().

pacsifier.cli.pacsifier.parse_findscu_dump_file(filename: str) List[Dict[str, str]]

Extract all useful information from the text file generated by dumping the output of the findscu command.

Parameters:

filename – path to textfile to be read

Returns:

list dictionaries each containing the attributes of a series

Return type:

list

pacsifier.cli.pacsifier.parse_query_table(table: <Mock name='mock.DataFrame' id='140068451577392'>, allowed_filters: ~typing.List[str] = ['StudyDate', 'StudyTime', 'SeriesDescription', 'PatientID', 'ProtocolName', 'StudyInstanceUID', 'SeriesInstanceUID', 'PatientName', 'PatientBirthDate', 'DeviceSerialNumber', 'AcquisitionDate', 'Modality', 'ImageType', 'SeriesNumber', 'StudyDescription', 'AccessionNumber', 'SequenceName', 'new_ids']) List[Dict[str, str]]

Take the query table passed as input to the script and parse it using attributes in the query / retrieve command.

Parameters:
  • table – input csv table

  • allowed_filters – list of allowed attributes to be filtered

Returns:

list of dictionaries each containing the corresponding value

of each attribute in allowed_filters (in the case an attribute has no corresponding column in the csv table, an empty string is given)

Return type:

list

pacsifier.cli.pacsifier.process_person_names(name: str) str

Modify patient name for the query input.

It convert all characters to uppercase, prepends a * to the last name and returns the new name.

Parameters:

name – patient’s name

Returns:

patient name in the format it will be used in the query

Return type:

string

pacsifier.cli.pacsifier.readLineByLine(filename: str) Iterator[str]

Return a list of lines of a text file located at the path filename.

Parameters:

filename – path to text file to be read

Yields:

Iterator – list of text lines of the file

pacsifier.cli.pacsifier.retrieve_dicoms_using_table(table: <Mock name='mock.DataFrame' id='140068451577392'>, parameters: ~typing.Dict[str, str], output_dir: str, save: bool, info: bool, move: bool, resume: bool = False, verbose: bool = False) None

Query and retrieve dicom images or / and their info dumps using the input query table.

Parameters:
  • table – query table

  • parameters – query/retrieve parameters

  • output_dir – path to the output directory

  • save – option to save the images

  • info – option to save info dumps

  • move – option to move images to remote destination

  • resume – option to skip already downloaded series

pacsifier.cli.pacsifier.upload_dicoms(dicom_dir: str, parameters: Dict[str, str]) None

Upload dicoms to a PACS server.

Parameters:
  • dicom_dir

    path to the directory containing the dicoms. The directory should adopt the structure adopted by PACSIFIER output directory when using the –save command. This means that it should contain a subdirectory for each patient, which in turn should contain a subdirectory for each study, which in turn should contain a subdirectory for each series, such as:

    dicom_dir

    ├── sub-01 │ ├── ses-01 │ │ ├── 00001-First_series │ │ │ ├── image1.dcm │ │ │ ├── image2.dcm │ │ │ ├── … │ │ ├── 00002-Second_series │ │ │ ├── image1.dcm │ │ │ ├── image2.dcm │ │ │ ├── … ├── sub-02 │ ├── ses-01 │ │ ├── 00001-First_series │ │ │ ├── image1.dcm

  • parameters – parameters from PACSIFIER configuration file

pacsifier.cli.anonymize_dicoms

Script to anonymize DICOM files for subsequent upload to Kheops.

pacsifier.cli.anonymize_dicoms.anonymize_all_dicoms_within_root_folder(output_folder: str = '.', datapath: str = './data', pattern_dicom_files: str = 'ses-*/*/*', new_ids: str | None = None, rename_patient_directories: bool = True, delete_identifiable_files: bool = True, remove_private_tags: bool = False, fuzz_acq_dates: bool = False) Dict[str, str][source]

Anonymizes all dicom images located at the datapath in the structure specified by pattern_dicom_files parameter.

Parameters:
  • output_folder – path where anonymized images will be located

  • datapath – path to the dicom images

  • pattern_dicom_files – (generic) path to the dicom images starting from the patient folder (in a PACSIFIER dump, this would reflect e.g. ses-20170115/0002-MPRAGE/*.dcm)

  • new_ids – anonymous ids to be set after anonymizing the original ids

  • rename_patient_directories – rename patient directories using the anonymized ids if True

  • delete_identifiable_files – delete DICOM Series which have identifiable information in the image data itself if True (in the case of screen savings coming from the GE Revolution CT machine, which have the patient name embedded for example)

  • remove_private_tags – remove all private tags if True

  • fuzz_acq_dates – shift the acquisition-related dates randomly by +- 30 days if True

Returns:

dictionary keeping track of the new patientIDs and old patientIDs mappings

Return type:

dict

pacsifier.cli.anonymize_dicoms.anonymize_dicom_file(filename: str, output_filename: str, PatientID: str, new_StudyInstanceUID: str, new_SeriesInstanceUID: str, new_SOPInstanceUID: str, fuzz_birthdate: bool = True, fuzz_acqdates: bool = False, fuzz_days_shift: int = 0, delete_identifiable_files: bool = False, remove_private_tags: bool = False) None[source]

Anonymize the dicom image located at filename by affecting patient id, patient name and date.

If identifiable data is present, deletes the file.

Parameters:
  • filename – path to dicom image

  • output_filename – output path of anonymized image

  • PatientID – the new patientID after anonymization

  • new_StudyInstanceUID – study instance UID to be used for depersonalisation. This should be a DICOM VR UI

  • new_SeriesInstanceUID – series instance UID to be used for depersonalisation. This should be a DICOM VR UI

  • new_SOPInstanceUID – SOP instance UID to be used for depersonalisation. This should be a DICOM VR UI

  • fuzz_birthdate – if True, to fuzz the birthdate or not

  • fuzz_acqdates – if True, fuzz acquisition-related dates including study date, InstanceCreationDate, SeriesDate, AcquisitionDate, ContentDate, PerformedProcedureStepStartDate, and (07a3,101b) ST (e.g. 201703251500), (07a3,1020) DA

  • fuzz_days_shift – number of days to shift dates (birth and various acquisition dates) by (can be positive or negative)

  • delete_identifiable_files – if True, delete DICOM Series which have identifiable information in the image data itself (in the case of SCREEN SAVE image type for dose reports coming from the GE Revolution CT machine, which have the patient name embedded, and from Toshiba/Canon Aquilion Prime, although these don’t have SCREEN SAVE label in ImageType tag)

  • remove_private_tags – if True remove all private tags

pacsifier.cli.anonymize_dicoms.fuzz_date(date: str, fuzz_parameter: int = 30) Tuple[str, int][source]

Fuzz a date in a range of fuzz_parameter days prior to fuzz_parameter days after.

Parameters:
  • date – date in YYYYMMDD format

  • fuzz_parameter – the number of days by which the date will be fuzzed

Returns:

new fuzzed date fuzz: number of days used in offset (can be positive or negative)

Return type:

str_date

pacsifier.cli.anonymize_dicoms.get_parser() ArgumentParser[source]

Get parser object for command line arguments of the script.

pacsifier.cli.anonymize_dicoms.main()[source]

Main function of the script that calls anonymize_all_dicoms_within_root_folder().

pacsifier.cli.anonymize_dicoms.parse_date(date: str) Tuple[int, int, int][source]

Extract year, month, day from a date.

Parameters:

date – date in YYYYMMDD format

Returns:

year, month, day

pacsifier.cli.anonymize_dicoms.shift_date_by_some_days(date_str: str, shift: int) str[source]

Add or subtract days from a date.

Parameters:
  • date – date in YYYYMMDD format

  • shift – the number of days by which the date will be shifted (can be positive or negative

Returns:

shifted date

Return type:

new_date_str

pacsifier.cli.create_dicomdir

Script to create a DICOMDIR of all dicoms within a folder.

pacsifier.cli.create_dicomdir.add_or_retrieve_name(current_folder: str, old_2_new: Dict[str, str]) Tuple[str, Dict[str, str]][source]

Check if the current folder has had a generated new name. If that is the case, return its new name, otherwise, generate a new name.

Parameters:
  • current_folder – current folder to be considered

  • old_2_new – dictionary keeping track of mapping between old and new folder / file names

Returns:

tuple containing the new name and the updated mapping between old and new name.

Return type:

tuple

pacsifier.cli.create_dicomdir.create_dicomdir(out_path: str) None[source]

Create a DICOMDIR of all dicoms with the path passed as parameter.

Parameters:

out_path – path of dicoms

pacsifier.cli.create_dicomdir.generate_new_folder_name(names: List[str] = []) str[source]

Generate a folder/file name having between 4 and 8 characters of capital letters and digits.

Parameters:

names – new names already generated for other folders.

Returns:

Generated folder/file name.

Return type:

str

pacsifier.cli.create_dicomdir.get_parser() ArgumentParser[source]

Get parser object for command line arguments of the script.

pacsifier.cli.create_dicomdir.main()[source]

Main function of the script that calls move_and_rename_files() and create_dicomdir().

pacsifier.cli.create_dicomdir.move_and_rename_files(dicom_path: str, output_path: str) None[source]

Copy all the files within the dicom hierarchy into new hierarchy with appropriate names for DICOMDIR creation.

Parameters:
  • dicom_path – current folder to be considered

  • output_path – path where the new dicom hierarchy will be stored

pacsifier.cli.get_pseudonyms

Script to get the new pseudonyms and day shifts in JSON format.

The script can be used in two modes:

  • de-id: use the de-ID API to get new pseudonyms and day shifts

  • custom: use a custom mapping file in CSV format that specifies the mapping of old / new pseudonyms

In case of the de-id mode, the script requires a PACSIFIER query file and a configuration file for the de-ID API. In case of the custom mode, the script requires a custom mapping file in CSV format.

The script saves the new pseudonyms and day shifts as JSON files in the specified output directory.

Example usage:

python get_pseudonyms.py --mode de-id --config config.json \
    --queryfile query.csv --project_name PACSIFIERCohort \
    --out_directory /path/to/output
python get_pseudonyms.py --mode custom --mappingfile mapping.csv \
    --shift-days --project_name PACSIFIERCohort \
    --out_directory /path/to/output
pacsifier.cli.get_pseudonyms.check_config_file_deid(config_file: Dict[str, str]) None[source]

Check that the config file passed as a parameter is valid.

Parameters:

config_file – dictionary loaded from the config json file

pacsifier.cli.get_pseudonyms.check_queryfile_content(queryfile: str) None[source]

Check that the PACSIFIER query file is valid.

Parameters:

queryfile – the path of the PACSIFIER query file

pacsifier.cli.get_pseudonyms.convert_csv_to_deid_json(queryfile: str, project_name: str) Dict[str, Any][source]

Convert PACSIFIER query to json format the de-ID API can understand.

Parameters:
  • queryfile – the filename of the PACSIFIER query file

  • project_name – the name of the project in GPCR (may or may not correspond to Kheops album)

Returns:

JSON object suitable for the API

pacsifier.cli.get_pseudonyms.generate_csv_with_pseudonyms_and_day_shifts(queryfile: str, pseudonyms: Dict[str, str], day_shifts: Dict[str, int], output_dir: str) None[source]

Create a CSV file with the original query file columns, new pseudonyms, and day shifts.

Parameters:
  • queryfile – path to the original PACSIFIER query file

  • pseudonyms – dictionary mapping old Patient IDs to new pseudonyms

  • day_shifts – dictionary mapping old Patient IDs to day shifts

  • output_dir – path to save the resulting CSV file

pacsifier.cli.get_pseudonyms.get_deid_day_shifts(deid_parameters: Dict[str, str], query_json: Dict[str, Any]) str[source]

Run the de-ID request for day shifts and return the response as a json.

Parameters:
  • deid_parameters

    dictionary containing the de-ID URL and token in the following format:

    {
        "deid_URL": "https://dummy.url.example",
        "deid_token": "1234567890"
    }
    

  • query_json – the PACSIFIER query formatted as a dictionary

Returns:

JSON object containing the day shifts for each patient

pacsifier.cli.get_pseudonyms.get_deid_pseudonyms(deid_parameters: Dict[str, str], query_json: Dict[str, Any]) str[source]

Run the de-ID request and return the response as a json.

Parameters:
  • deid_parameters

    dictionary containing the de-ID URL and token in the following format:

    {
        "deid_URL": "https://dummy.url.example",
        "deid_token": "1234567890"
    }
    

  • query_json – the PACSIFIER query formatted as a dictionary

Returns:

JSON object containing the new pseudonyms for each patient

pacsifier.cli.get_pseudonyms.get_parser() ArgumentParser[source]

Get parser for command line arguments.

pacsifier.cli.get_pseudonyms.main()[source]

Main function of the script.

pacsifier.cli.get_pseudonyms.split_deid_query_json_in_batch(deid_query_json: Dict[str, Any], batch_size: int = 500) List[Dict[str, Any]][source]

Split the patients provided in parameter in several batch of batch_size length.

Parameters:
  • deid_query_json – dictionary loaded from the deid json file

  • batch_size – The size of one batch

Returns:

List of dictionaries, each containing a batch of patients with the project information

pacsifier.cli.move_dumps

Script to move all csv files retrieved by pacsifier --info ... into a new folder.

pacsifier.cli.move_dumps.get_parser() ArgumentParser[source]

Get parser for command line arguments.

pacsifier.cli.move_dumps.main()[source]

Main function of the script that calls move().

pacsifier.cli.move_dumps.move(dicom_path: str, output_path: str) None[source]

Move all csv info files within a dicom directory into a new directory.

Parameters:
  • dicom_path – path to the folder containing dicoms.

  • output_path – path where the csv files within the dicom path will be moved.

pacsifier.cli.add_karnak_tags

Add private DICOM tags to several studies so that Karnak can de-identify them using provided patient codes and route them to the appropriate Kheops album.

pacsifier.cli.add_karnak_tags.get_parser() ArgumentParser[source]

Get parser object for command line arguments of the script.

pacsifier.cli.add_karnak_tags.main()[source]

Main function of the script that calls tag_all_dicoms_within_root_folder().

pacsifier.cli.add_karnak_tags.tag_all_dicoms_within_root_folder(data_path: str, new_ids: Dict[str, str], day_shift: Dict[str, str], album_name: str) None[source]

Tag all dicom images located at the datapath for Karnak, adding an album name and patientCode private tags.

Parameters:
  • data_path – path to the dicom images

  • new_ids – real:code mapping to be used after de-identifying the original ids

  • day_shift – day shift per patient

  • album_name – name of the Kheops album

pacsifier.cli.add_karnak_tags.tag_dicom_file(filename: str, patient_code: str, patient_shift: str, album_name: str) None[source]

Tag the dicom image located at filename by adding patient code and Kheops album name to private tags for subsequent de-identification.

Parameters:
  • filename – path to dicom image

  • patient_code – pseudonymous patient code

  • album_name – Kheops album name

pacsifier.cli.extract_carestream_report

Script to extract plain text from Carestream radiology reports in SR.

pacsifier.cli.extract_carestream_report.extract_txt_report(data_folder: str) None[source]

This function loops over a BIDS-like (Brain Imaging Data Structure) dataset.

If some SRc files are found, it converts them to txt files and saves them in the same directory.

Note

The function assumes that each subject is stored as ~/.../sub-XXXXXX/ses-YYYYYYYYYYYYY/00001-CarestreamPACSReports/ SRc.x.x.x.

Parameters:

data_folder (str) – path to BIDS-like dataset

pacsifier.cli.extract_carestream_report.get_parser() ArgumentParser[source]

Get parser object for command line arguments of the script.

Note

It is assumed that each subject is stored as ~/.../sub-XXXXXX/ses-YYYYYYYYYYYYY/00001-CarestreamPACSReports.

pacsifier.cli.extract_carestream_report.main()[source]

Main function of the script that calls extract_txt_report().

pacsifier.cli.extract_carestream_report.replace_special_char_combinations(input_report, print_clean_report=False) str[source]

This function corrects encoding errors that occur in the reports.

Parameters:
  • input_report (str) – report that needs to be cleaned

  • print_clean_report (bool) – whether we want to print the cleaned report

Returns:

clean report without encoding errors

Return type:

cleaned_report (str)