Because I telecommute, I'm limited to using my company's webmail interface, Microsoft Outlook Web Access, rather than having direct POP or IMAP access to e-mail. This isn't ideal, for several reasons:
- Outlook Web Access has a horrendous user interface in any browser other than Internet Explorer. (And I'm on Linux, so I can't use Internet Explorer.) It's hard to search, the icons are unintuitive, it encourages top-posting and doesn't have the basic benefits of a desktop e-mail app, such as spell-checking and address auto-completion.
- Using webmail forces me to keep a browser window/tab open to check messages. And Outlook Web Access doesn't auto-refresh, so I have to remember to click "Inbox" every so often to get the latest messages. This is a huge disruption.
- It's just simpler and more efficient to have all my e-mail in one place.
So I figured I'd do a bit of programming to make my life easier. The result: weboutlook, a Python library that screen-scrapes Outlook Web Access. It can:
- Log into a Microsoft Outlook Web Access account on a given server with a given username and password.
- Retrieve all e-mail IDs from the first page of your Inbox.
- Retrieve all e-mail IDs from the first page of any folder in your webmail (such as "Sent Items").
- Retrieve the full, raw source of the e-mail with a given ID.
- Delete an e-mail with a given ID (technically, move it to the "Deleted Items" folder).
Also, I've included a Python implementation of a POP server that provides a POP interface to the scraper. This means I can point my desktop e-mail client at the script, my e-mail client will think it's a normal POP server, and my e-mails will download nicely into my desktop app, with the screen-scraper running silently behind the scenes.
I put this together in my free time, and it's been working nicely for a week, so I'm open-sourcing it for other poor souls who've been sentenced to use Outlook Web Access. I presented this at tonight's Chicago Python Users Group meeting and was surprised to see that, even in a group of only 30 people, 5 or 6 people used Outlook Web Access through their company. I hope somebody finds this useful.
Please send comments and improvements.
(Footnote: In doing research for this, I found MrPostman, which claims to convert various webmails into POP. It didn't fit my needs -- it's a bulky Java app and doesn't actually retrieve the raw source of Outlook Web Access e-mails -- but I mention it here in case it's helpful to somebody.)
Comments
Posted by Frank Wiles on February 10, 2006, at 10:08 p.m.:
Just an FYI Adrian, but you can run IE via WINE without too much hassle under Linux. It might make you feel dirty on the inside, but it works. :)
Posted by David W. on February 10, 2006, at 10:46 p.m.:
Wow! Good work, Adrian. That reminds me of a time I was freelancing on-site at a company with an iron grip on their network. They wouldn't allow POP access to outside servers, so I couldn't check my email through a normal route. I also couldn't use webmail, since it was on a non-standard port, which was also blocked.
For some reason they DID allow telnet access. So I used telnet to access one of my servers, and then used lynx through the telnet interface in order to access my webmail interface to check the mail on my other server (which didn't allow telnet/SSH access, so no Pine for me). It was a huge pain, but it was the only solution I could come up with. (I didn't know of any free web proxies back then)
Fortunately, I was only there for a few days so I didn't have to put up with that for long. Not that I'm as industrious as you are, and I wasn't as good with scripting then, so a custom solution was implausible for me. I wish I'd had such an elegant solution back then.
Posted by Andrew Dupont on February 11, 2006, at 1:30 a.m.:
Dude, I love you.
Posted by Ben Cartwright on February 11, 2006, at 3:51 a.m.:
Saw your announcement on comp.lang.python. This script is immensely awesome, thanks for sharing! Your code is elegantly written, too (pleasantly surprising for a scraper :-).
I ran into one problem, though. First, an unfixed bug in Python 2.4 breaks your code (socket.setdefaulttimeout + urlopen + SSL = IOError every time). This affects early versions of 2.3 too, see bug #1153016. Just something you might want to mention in the source, or work around it somehow. Is the call to socket.setdefaulttimeout really necessary? I got past this issue using Python 2.3.5.
Several feature suggestions/minor quibbles:
(1) Maybe customize the User-Agent string to include the name and version of your module. It would help confused Exchange admins determine why there are a bunch of "Python/urllib" user-agents in their IIS logs. (On the flip side, it would allow paranoid users to spoof Mozilla or whatever by changing a single string.)
(2) Support for Basic authentication would be cool. Not all deployments of OWA use forms-based authentication. So instead of posting to owaauth.dll, one would have to use urllib2 with an HTTPBasicAuthHandler on the opener.
(3) The way scraper.py uses it, "domain" is a misleading variable name -- especially when a Windows domain is involved! "host" would be a much clearer name, I think.
(4) popdaemon.py has a hard-coded URL that would be better off as a command-line parameter.
Again, thanks for sharing this great script!
Posted by S.K. on February 14, 2006, at 4:44 p.m.:
Thanks dude! I hate our web interface. I really think this is great!
Posted by anon on February 17, 2006, at 4:41 p.m.:
You could try the IE tab in Firefox. I use that to access co's Outlook web mail and it works
Posted by Jeremy Dunck on March 4, 2006, at 6:59 a.m.:
Original post sez:
"(And I'm on Linux, so I can't use Internet Explorer.)"
Posted by Phing on March 6, 2006, at 9:43 a.m.:
Is OMA a lower bandwidth version of OWA? If so, anyone have any pointers on where to get documentation so I can try to adapt this to OMA?
Posted by M March on April 3, 2006, at 10:51 p.m.:
Man.. an added IMAP proxy would be the bee's knees.
:)
Posted by XXX on May 17, 2006, at 7:39 a.m.:
Why didn't you just use Web DAV instead of screen scraping something that could easily change and break your code?
Posted by Adrian on May 17, 2006, at 4:37 p.m.:
XXX: Because I don't have access to the WebDAV interface, and because the webmail site probably *won't* change easily, because that would require an upgrade of the software.
Posted by anonymous on May 17, 2006, at 11:27 p.m.:
Thanks for posting this. The non-IE interface really is awful. I also found another Outlook Web Access downloader at http://personal.inet.fi/atk/fetchexc/. It is written in Java and uses WebDAV over http or https.
Posted by Eric on June 1, 2006, at 9:17 a.m.:
Sweet I can use this to pass the messages to maildrop or procmail...
Posted by raj@ashutosh.info on June 2, 2006, at 5:57 p.m.:
Is there any way to use it under window...I want to get all mails from OWA to my outlook. My compay doesn't allow pop/imap/exchange access form outsided.
Posted by Dave wb0gaz@hotmail.com on July 9, 2006, at 4:30 p.m.:
Would be really grateful if someone could post (maybe even here?) a python program that could call this library and print the results - basically just the program/script that calls the library from a shell command; I do not know Python and don't really have time to learn another language (even if simple!) so a "quick start" script would be very useful.
Posted by sfb on August 9, 2006, at 12:32 a.m.:
"""My compay doesn't allow pop/imap/exchange access form outsided."""
Considering that Outlook Web Access and Outlook / HTTPS both use HTTP requests over HTTPS to IIS at the backend, and both authenticate with your domain username and password, there is no security reason to allow one but not the other.
Posted by Rob S on August 9, 2006, at 12:36 a.m.:
I am looking for a complete implementation of this code to scrape OWA, extract the mail content from the Inbox, dump the content into Outlook Inbox and then delete the OWA content just extracted. The intent would be to be able to run this on my laptop in some fashion and just dump the email into my Inbox when manually triggered in some way.
Comments have been turned off for this page.