pacsifier.cli
The pacsifier.cli subpackage contains multiple modules that define the command line interface (CLI) tools of the pacsifier package.
pacsifier.cli.pacsifier
Script to query, retrieve, and upload DICOM images from / to a PACS server.
- pacsifier.cli.pacsifier.check_query_table_allowed_filters(table: <Mock name='mock.DataFrame' id='140068451577392'>, allowed_filters: ~typing.List[str] = ['StudyDate', 'StudyTime', 'SeriesDescription', 'PatientID', 'ProtocolName', 'StudyInstanceUID', 'SeriesInstanceUID', 'PatientName', 'PatientBirthDate', 'DeviceSerialNumber', 'AcquisitionDate', 'Modality', 'ImageType', 'SeriesNumber', 'StudyDescription', 'AccessionNumber', 'SequenceName', 'new_ids']) None
Check if the csv table passed as input has only attributes that are allowed.
- Parameters:
table – table containing all filters
allowed_filters – list of allowed attribute names
- pacsifier.cli.pacsifier.get_parser() ArgumentParser
Return the parser object for this script.
- pacsifier.cli.pacsifier.main()
Main function of the script that calls
retrieve_dicoms_using_table().
- pacsifier.cli.pacsifier.parse_findscu_dump_file(filename: str) List[Dict[str, str]]
Extract all useful information from the text file generated by dumping the output of the findscu command.
- Parameters:
filename – path to textfile to be read
- Returns:
list dictionaries each containing the attributes of a series
- Return type:
list
- pacsifier.cli.pacsifier.parse_query_table(table: <Mock name='mock.DataFrame' id='140068451577392'>, allowed_filters: ~typing.List[str] = ['StudyDate', 'StudyTime', 'SeriesDescription', 'PatientID', 'ProtocolName', 'StudyInstanceUID', 'SeriesInstanceUID', 'PatientName', 'PatientBirthDate', 'DeviceSerialNumber', 'AcquisitionDate', 'Modality', 'ImageType', 'SeriesNumber', 'StudyDescription', 'AccessionNumber', 'SequenceName', 'new_ids']) List[Dict[str, str]]
Take the query table passed as input to the script and parse it using attributes in the query / retrieve command.
- Parameters:
table – input csv table
allowed_filters – list of allowed attributes to be filtered
- Returns:
- list of dictionaries each containing the corresponding value
of each attribute in allowed_filters (in the case an attribute has no corresponding column in the csv table, an empty string is given)
- Return type:
list
- pacsifier.cli.pacsifier.process_person_names(name: str) str
Modify patient name for the query input.
It convert all characters to uppercase, prepends a
*to the last name and returns the new name.- Parameters:
name – patient’s name
- Returns:
patient name in the format it will be used in the query
- Return type:
string
- pacsifier.cli.pacsifier.readLineByLine(filename: str) Iterator[str]
Return a list of lines of a text file located at the path filename.
- Parameters:
filename – path to text file to be read
- Yields:
Iterator – list of text lines of the file
- pacsifier.cli.pacsifier.retrieve_dicoms_using_table(table: <Mock name='mock.DataFrame' id='140068451577392'>, parameters: ~typing.Dict[str, str], output_dir: str, save: bool, info: bool, move: bool, resume: bool = False, verbose: bool = False) None
Query and retrieve dicom images or / and their info dumps using the input query table.
- Parameters:
table – query table
parameters – query/retrieve parameters
output_dir – path to the output directory
save – option to save the images
info – option to save info dumps
move – option to move images to remote destination
resume – option to skip already downloaded series
- pacsifier.cli.pacsifier.upload_dicoms(dicom_dir: str, parameters: Dict[str, str]) None
Upload dicoms to a PACS server.
- Parameters:
dicom_dir –
path to the directory containing the dicoms. The directory should adopt the structure adopted by PACSIFIER output directory when using the –save command. This means that it should contain a subdirectory for each patient, which in turn should contain a subdirectory for each study, which in turn should contain a subdirectory for each series, such as:
- dicom_dir
├── sub-01 │ ├── ses-01 │ │ ├── 00001-First_series │ │ │ ├── image1.dcm │ │ │ ├── image2.dcm │ │ │ ├── … │ │ ├── 00002-Second_series │ │ │ ├── image1.dcm │ │ │ ├── image2.dcm │ │ │ ├── … ├── sub-02 │ ├── ses-01 │ │ ├── 00001-First_series │ │ │ ├── image1.dcm
parameters – parameters from PACSIFIER configuration file
pacsifier.cli.anonymize_dicoms
Script to anonymize DICOM files for subsequent upload to Kheops.
- pacsifier.cli.anonymize_dicoms.anonymize_all_dicoms_within_root_folder(output_folder: str = '.', datapath: str = './data', pattern_dicom_files: str = 'ses-*/*/*', new_ids: str | None = None, rename_patient_directories: bool = True, delete_identifiable_files: bool = True, remove_private_tags: bool = False, fuzz_acq_dates: bool = False) Dict[str, str][source]
Anonymizes all dicom images located at the datapath in the structure specified by pattern_dicom_files parameter.
- Parameters:
output_folder – path where anonymized images will be located
datapath – path to the dicom images
pattern_dicom_files – (generic) path to the dicom images starting from the patient folder (in a PACSIFIER dump, this would reflect e.g.
ses-20170115/0002-MPRAGE/*.dcm)new_ids – anonymous ids to be set after anonymizing the original ids
rename_patient_directories – rename patient directories using the anonymized ids if True
delete_identifiable_files – delete DICOM Series which have identifiable information in the image data itself if True (in the case of screen savings coming from the GE Revolution CT machine, which have the patient name embedded for example)
remove_private_tags – remove all private tags if True
fuzz_acq_dates – shift the acquisition-related dates randomly by +- 30 days if True
- Returns:
dictionary keeping track of the new patientIDs and old patientIDs mappings
- Return type:
dict
- pacsifier.cli.anonymize_dicoms.anonymize_dicom_file(filename: str, output_filename: str, PatientID: str, new_StudyInstanceUID: str, new_SeriesInstanceUID: str, new_SOPInstanceUID: str, fuzz_birthdate: bool = True, fuzz_acqdates: bool = False, fuzz_days_shift: int = 0, delete_identifiable_files: bool = False, remove_private_tags: bool = False) None[source]
Anonymize the dicom image located at filename by affecting patient id, patient name and date.
If identifiable data is present, deletes the file.
- Parameters:
filename – path to dicom image
output_filename – output path of anonymized image
PatientID – the new patientID after anonymization
new_StudyInstanceUID – study instance UID to be used for depersonalisation. This should be a DICOM VR UI
new_SeriesInstanceUID – series instance UID to be used for depersonalisation. This should be a DICOM VR UI
new_SOPInstanceUID – SOP instance UID to be used for depersonalisation. This should be a DICOM VR UI
fuzz_birthdate – if True, to fuzz the birthdate or not
fuzz_acqdates – if True, fuzz acquisition-related dates including study date, InstanceCreationDate, SeriesDate, AcquisitionDate, ContentDate, PerformedProcedureStepStartDate, and (07a3,101b) ST (e.g. 201703251500), (07a3,1020) DA
fuzz_days_shift – number of days to shift dates (birth and various acquisition dates) by (can be positive or negative)
delete_identifiable_files – if True, delete DICOM Series which have identifiable information in the image data itself (in the case of SCREEN SAVE image type for dose reports coming from the GE Revolution CT machine, which have the patient name embedded, and from Toshiba/Canon Aquilion Prime, although these don’t have SCREEN SAVE label in ImageType tag)
remove_private_tags – if True remove all private tags
- pacsifier.cli.anonymize_dicoms.fuzz_date(date: str, fuzz_parameter: int = 30) Tuple[str, int][source]
Fuzz a date in a range of fuzz_parameter days prior to fuzz_parameter days after.
- Parameters:
date – date in YYYYMMDD format
fuzz_parameter – the number of days by which the date will be fuzzed
- Returns:
new fuzzed date fuzz: number of days used in offset (can be positive or negative)
- Return type:
str_date
- pacsifier.cli.anonymize_dicoms.get_parser() ArgumentParser[source]
Get parser object for command line arguments of the script.
- pacsifier.cli.anonymize_dicoms.main()[source]
Main function of the script that calls
anonymize_all_dicoms_within_root_folder().
- pacsifier.cli.anonymize_dicoms.parse_date(date: str) Tuple[int, int, int][source]
Extract year, month, day from a date.
- Parameters:
date – date in
YYYYMMDDformat- Returns:
year, month, day
- pacsifier.cli.anonymize_dicoms.shift_date_by_some_days(date_str: str, shift: int) str[source]
Add or subtract days from a date.
- Parameters:
date – date in
YYYYMMDDformatshift – the number of days by which the date will be shifted (can be positive or negative
- Returns:
shifted date
- Return type:
new_date_str
pacsifier.cli.create_dicomdir
Script to create a DICOMDIR of all dicoms within a folder.
- pacsifier.cli.create_dicomdir.add_or_retrieve_name(current_folder: str, old_2_new: Dict[str, str]) Tuple[str, Dict[str, str]][source]
Check if the current folder has had a generated new name. If that is the case, return its new name, otherwise, generate a new name.
- Parameters:
current_folder – current folder to be considered
old_2_new – dictionary keeping track of mapping between old and new folder / file names
- Returns:
tuple containing the new name and the updated mapping between old and new name.
- Return type:
tuple
- pacsifier.cli.create_dicomdir.create_dicomdir(out_path: str) None[source]
Create a DICOMDIR of all dicoms with the path passed as parameter.
- Parameters:
out_path – path of dicoms
- pacsifier.cli.create_dicomdir.generate_new_folder_name(names: List[str] = []) str[source]
Generate a folder/file name having between 4 and 8 characters of capital letters and digits.
- Parameters:
names – new names already generated for other folders.
- Returns:
Generated folder/file name.
- Return type:
str
- pacsifier.cli.create_dicomdir.get_parser() ArgumentParser[source]
Get parser object for command line arguments of the script.
- pacsifier.cli.create_dicomdir.main()[source]
Main function of the script that calls
move_and_rename_files()andcreate_dicomdir().
- pacsifier.cli.create_dicomdir.move_and_rename_files(dicom_path: str, output_path: str) None[source]
Copy all the files within the dicom hierarchy into new hierarchy with appropriate names for DICOMDIR creation.
- Parameters:
dicom_path – current folder to be considered
output_path – path where the new dicom hierarchy will be stored
pacsifier.cli.get_pseudonyms
Script to get the new pseudonyms and day shifts in JSON format.
The script can be used in two modes:
de-id: use the de-ID API to get new pseudonyms and day shifts
custom: use a custom mapping file in CSV format that specifies the mapping of old / new pseudonyms
In case of the de-id mode, the script requires a PACSIFIER query file and a configuration file for the de-ID API. In case of the custom mode, the script requires a custom mapping file in CSV format.
The script saves the new pseudonyms and day shifts as JSON files in the specified output directory.
Example usage:
python get_pseudonyms.py --mode de-id --config config.json \
--queryfile query.csv --project_name PACSIFIERCohort \
--out_directory /path/to/output
python get_pseudonyms.py --mode custom --mappingfile mapping.csv \
--shift-days --project_name PACSIFIERCohort \
--out_directory /path/to/output
- pacsifier.cli.get_pseudonyms.check_config_file_deid(config_file: Dict[str, str]) None[source]
Check that the config file passed as a parameter is valid.
- Parameters:
config_file – dictionary loaded from the config json file
- pacsifier.cli.get_pseudonyms.check_queryfile_content(queryfile: str) None[source]
Check that the PACSIFIER query file is valid.
- Parameters:
queryfile – the path of the PACSIFIER query file
- pacsifier.cli.get_pseudonyms.convert_csv_to_deid_json(queryfile: str, project_name: str) Dict[str, Any][source]
Convert PACSIFIER query to json format the de-ID API can understand.
- Parameters:
queryfile – the filename of the PACSIFIER query file
project_name – the name of the project in GPCR (may or may not correspond to Kheops album)
- Returns:
JSON object suitable for the API
- pacsifier.cli.get_pseudonyms.generate_csv_with_pseudonyms_and_day_shifts(queryfile: str, pseudonyms: Dict[str, str], day_shifts: Dict[str, int], output_dir: str) None[source]
Create a CSV file with the original query file columns, new pseudonyms, and day shifts.
- Parameters:
queryfile – path to the original PACSIFIER query file
pseudonyms – dictionary mapping old Patient IDs to new pseudonyms
day_shifts – dictionary mapping old Patient IDs to day shifts
output_dir – path to save the resulting CSV file
- pacsifier.cli.get_pseudonyms.get_deid_day_shifts(deid_parameters: Dict[str, str], query_json: Dict[str, Any]) str[source]
Run the de-ID request for day shifts and return the response as a json.
- Parameters:
deid_parameters –
dictionary containing the de-ID URL and token in the following format:
{ "deid_URL": "https://dummy.url.example", "deid_token": "1234567890" }
query_json – the PACSIFIER query formatted as a dictionary
- Returns:
JSON object containing the day shifts for each patient
- pacsifier.cli.get_pseudonyms.get_deid_pseudonyms(deid_parameters: Dict[str, str], query_json: Dict[str, Any]) str[source]
Run the de-ID request and return the response as a json.
- Parameters:
deid_parameters –
dictionary containing the de-ID URL and token in the following format:
{ "deid_URL": "https://dummy.url.example", "deid_token": "1234567890" }
query_json – the PACSIFIER query formatted as a dictionary
- Returns:
JSON object containing the new pseudonyms for each patient
- pacsifier.cli.get_pseudonyms.get_parser() ArgumentParser[source]
Get parser for command line arguments.
- pacsifier.cli.get_pseudonyms.main()[source]
Main function of the script.
- pacsifier.cli.get_pseudonyms.split_deid_query_json_in_batch(deid_query_json: Dict[str, Any], batch_size: int = 500) List[Dict[str, Any]][source]
Split the patients provided in parameter in several batch of batch_size length.
- Parameters:
deid_query_json – dictionary loaded from the deid json file
batch_size – The size of one batch
- Returns:
List of dictionaries, each containing a batch of patients with the project information
pacsifier.cli.move_dumps
Script to move all csv files retrieved by pacsifier --info ... into a new folder.
- pacsifier.cli.move_dumps.get_parser() ArgumentParser[source]
Get parser for command line arguments.
- pacsifier.cli.move_dumps.move(dicom_path: str, output_path: str) None[source]
Move all csv info files within a dicom directory into a new directory.
- Parameters:
dicom_path – path to the folder containing dicoms.
output_path – path where the csv files within the dicom path will be moved.
pacsifier.cli.extract_carestream_report
Script to extract plain text from Carestream radiology reports in SR.
- pacsifier.cli.extract_carestream_report.extract_txt_report(data_folder: str) None[source]
This function loops over a BIDS-like (Brain Imaging Data Structure) dataset.
If some SRc files are found, it converts them to txt files and saves them in the same directory.
Note
The function assumes that each subject is stored as
~/.../sub-XXXXXX/ses-YYYYYYYYYYYYY/00001-CarestreamPACSReports/ SRc.x.x.x.- Parameters:
data_folder (str) – path to BIDS-like dataset
- pacsifier.cli.extract_carestream_report.get_parser() ArgumentParser[source]
Get parser object for command line arguments of the script.
Note
It is assumed that each subject is stored as
~/.../sub-XXXXXX/ses-YYYYYYYYYYYYY/00001-CarestreamPACSReports.
- pacsifier.cli.extract_carestream_report.main()[source]
Main function of the script that calls
extract_txt_report().
- pacsifier.cli.extract_carestream_report.replace_special_char_combinations(input_report, print_clean_report=False) str[source]
This function corrects encoding errors that occur in the reports.
- Parameters:
input_report (str) – report that needs to be cleaned
print_clean_report (bool) – whether we want to print the cleaned report
- Returns:
clean report without encoding errors
- Return type:
cleaned_report (str)