A ruby gem for parsing MHTML.
Uses the NodeJS C HTTP Parser under the hood (thanks to @cotag for the gem).
Add this line to your application's Gemfile:
gem 'mhtml'And then execute:
$ bundle
Or install it yourself as:
$ gem install mhtml
Two interfaces are provided - all at once, or chunked.
For when you have all of the data in memory.
source = File.open('/file/path.mht').read
doc = Mhtml::RootDocument.new(source)
doc.headers.each { |h| puts h }
# body is decoded from printed quotable, and encoded according to charset header
puts doc.body 
doc.sub_docs.each { |s| puts subdoc }For when source data is being streamed, or when concerned about memory usage.
doc = Mhtml::RootDocument.new
doc.on_header { |h| handle_header(h) } # yields each header
# yields body, possibly in parts
doc.on_body do |b|
  encoding = doc.encoding
  handle_body(b)
end
doc.on_subdoc_begin { handle_subdoc_begin } # yields nil on each subdoc begin
doc.on_subdoc_header { |h| handle_subdoc_header(h) } # yields each subdoc header
doc.on_subdoc_body { |b| handle_subdoc_body(b) } # yields each subdoc's body, possibly in parts
doc.on_subdoc_complete { handle_subdoc_begin } # yields nil on each subdoc complete
File.open('/file/path.mht').read.scan(/.{128}/).each do |chunk|
  doc << chunk
endThe header class looks like this (portayed as a hash):
# Content-Type: multipart/related; charset="windows-1252"; boundary="----=_NextPart_01C74319.B7EA56A0"
{
  key: 'Content-Type',
  values: [
    { key: nil, value: 'multipart/related' },
    { key: 'charset', value: 'windows-1252' },
    { key: 'boundary', value: '----=_NextPart_01C74319.B7EA56A0' }
  ]
}- Revisit spec fixtures - either use existing solution or break out to separate gem
 - Build up body of fixtures using MHTML from various sources
 
After checking out the repo, run bin/setup to install dependencies. Then, run
rake spec to run the tests. You can also run bin/console for an interactive
prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install.
To release a new version, update the version number in version.rb, and then
run bundle exec rake release, which will create a git tag for the version,
push git commits and tags, and push the .gem file to
rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/benjineering/mhtml_rb.