What is CHM to Markdown Converter?
CHM to Markdown Converter is a Python utility designed to convert Compiled HTML Help (CHM) files into Markdown format, making technical documentation more accessible and easier to manage with version control.
How to use CHM to Markdown Converter?
To use the converter, clone the repository, install the required Python packages, configure the input and output paths in the script, and run the script to convert your CHM files to Markdown.
Key features of CHM to Markdown Converter?
- Extracts CHM files using 7-Zip
- Converts HTML content to clean Markdown format
- Special handling for code snippets with language-specific syntax highlighting
- Preserves and fixes tables
- Updates internal links to maintain document references
- Processes files asynchronously for better performance
- Batch processing with progress reporting
Use cases of CHM to Markdown Converter?
- Converting technical documentation from CHM to Markdown for easier editing.
- Preparing documentation for version control systems.
- Extracting content from legacy CHM files for modern documentation practices.
FAQ from CHM to Markdown Converter?
- What are the system requirements?
You need Python 3.7+, 7-Zip installed, and specific Python packages like beautifulsoup4, html2text, and aiofiles.
- Can I customize the conversion process?
Yes! The script allows customization of which HTML elements to remove and how code snippets are formatted.
- What should I do if I encounter errors?
Check for missing modules, ensure 7-Zip is installed correctly, and run the terminal with administrator privileges if needed.
CHM to Markdown Converter
A Python utility for converting Compiled HTML Help (CHM) files to Markdown format. This tool extracts HTML files from CHM documents and converts them to well-formatted Markdown files, making technical documentation more accessible and version control friendly.
Features
- Extracts CHM files using 7-Zip
- Converts HTML content to clean Markdown format
- Special handling for code snippets with language-specific syntax highlighting
- Preserves and fixes tables
- Updates internal links to maintain document references
- Processes files asynchronously for better performance
- Batch processing with progress reporting
Requirements
- Python 3.7+
- 7-Zip installed in the default location (
C:\Program Files\7-Zip\7z.exe
) - The following Python packages:
- beautifulsoup4
- html2text
- aiofiles
Installation
- Clone or download this repository
- Install required Python packages:
pip install -r requirements.txt
Or install them directly:
pip install beautifulsoup4 html2text aiofiles
Usage
- Edit the configuration variables in the
main()
function ofchm_to_markdown.py
:
input_folder = r"C:\Path\To\Extracted\Files" # Temporary folder for extracting CHM
output_folder = r"C:\Path\To\Output\Markdown" # Where Markdown files will be saved
chm_file_path = r"C:\Path\To\Your\File.chm" # Your CHM file path
- Run the script:
python chm_to_markdown.py
- The script will:
- Clear the input and output folders
- Extract CHM files to the input folder
- Convert HTML files to Markdown
- Save the Markdown files to the output folder
Performance Tuning
You can adjust the following parameters in the process_folder_async()
call to optimize performance for your system:
max_workers
: Number of worker threads for CPU-bound operationssemaphore_limit
: Maximum concurrent file I/O operationsbatch_size
: Number of files to process in each batch
await process_folder_async(
input_folder, output_folder, max_workers=8, semaphore_limit=20, batch_size=50
)
Customization
The script provides several customization options for content conversion:
Removing Unwanted Elements
You can customize which HTML elements to remove by editing these lists:
tags_to_remove = ["iframe", "object", "script", "br", "img"]
classes_to_remove = ["collapsibleAreaRegion", "collapsibleRegionTitle", ...]
ids_to_remove = ["PageFooter"]
Code Snippets
The script handles code snippets with language-specific formatting. You can customize the language mapping:
id_to_lang = {
"IDAB_code_Div1": "csharp",
"IDAB_code_Div2": "vb",
"IDAB_code_Div3": "cpp",
"IDAB_code_Div4": "fsharp",
}
Troubleshooting
- Missing modules error: Make sure you've installed all required packages and your Python environment is correctly configured.
- 7-Zip not found: Check that 7-Zip is installed in the default location or update the path in the script.
- Permission errors: Run your terminal or command prompt with administrator privileges.
- Memory issues with large CHM files: Try increasing the batch size and reducing max_workers to manage memory usage.
License
This project is open source and available under the MIT License.