YJL: Using Python to get Google Reader unread count and list

I have been using this snippet for the authentication on Google Reader and get the unread list, but it had been failing for a while. It turned out SID (Session ID) authentication is not allowed anymore, it had been announced and it effected recently.

Now, you will need to get the Auth token, which is exactly same as you get the SID. But you will have to include service=reader in your request data or you will only get SID and LSID. I think this new Auth token might be issued by per service base.

Here is my final script:

#!/usr/bin/env python


from xml.dom import minidom
from xml.dom import EMPTY_NAMESPACE
try:
  import json
except ImportError:
  import simplejson as json
import urllib
import urllib2

username = 'usergmail.com'
password = '***SECRET***'

# Authenticate to obtain Auth
auth_url = 'https://www.google.com/accounts/ClientLogin'
auth_req_data = urllib.urlencode({
    'Email': username,
    'Passwd': password,
    'service': 'reader'
    })
auth_req = urllib2.Request(auth_url, data=auth_req_data)
auth_resp = urllib2.urlopen(auth_req)
auth_resp_content = auth_resp.read()
auth_resp_dict = dict(x.split('=') for x in auth_resp_content.split('\n') if x)
AUTH = auth_resp_dict["Auth"]

# Create a cookie in the header using the Auth
header = {'Authorization': 'GoogleLogin auth=%s' % AUTH}

####################################
### PART 1: Getting unread count ###
####################################

reader_base_url = 'http://www.google.com/reader/api/0/unread-count?%s'
reader_req_data = urllib.urlencode({ 'all': 'true', 'output': 'json'})

reader_url = reader_base_url % (reader_req_data)
reader_req = urllib2.Request(reader_url, None, header)
reader_resp = urllib2.urlopen(reader_req)
#reader_resp_content = reader_resp.read()
#j = json.loads(reader_resp_content)
j = json.load(reader_resp)
count = ([c['count'] for c in j['unreadcounts'] if c['id'].endswith('/state/com.google/reading-list')] or [0])[0]
if count:
  print 'Unread: %d' % count
else:
  print 'No unread items.'

###################################
### PART 2: Getting unread list ###
###################################

if count:
  ATOM_NS = 'http://www.w3.org/2005/Atom'

  reader_base_url = r'http://www.google.com/reader/atom/user%2F-%2Fstate%2Fcom.google%2freading-list?n=50'
  reader_url = reader_base_url
  reader_req = urllib2.Request(reader_url, None, header)
  reader_resp = urllib2.urlopen(reader_req)
  doc = minidom.parse(reader_resp)
  doc.normalize()

  for entry in doc.getElementsByTagNameNS(ATOM_NS, u'entry'):
    title = entry.getElementsByTagNameNS(ATOM_NS, u'title')[0].firstChild.data
    if [True for cat in entry.getElementsByTagNameNS(ATOM_NS, u'category') if cat.getAttributeNS(EMPTY_NAMESPACE, u'term').endswith('/state/com.google/read')]:
      continue
    print title

It does two tasks: 1) it get the unread count, and 2) it lists the unread items.

For the first task, we request a JSON format, using JSON is easier than XML format, or say JSON library is easier to use than the xml.minidom. The returned data contains many type of unread counts, you will need to find the ones ID ends with /state/com.google/reading-list.

The second task, there is no output=json option, so XML is. The returned Atom entries will also contain read items, you need to filter them out by checking if there is a categorys term ends with /state/com.google/read, which indicates the item is read or mark as read. Since, the entries contain read items, therefore if you have more than 50 unread items (there is a n=50 in request URL), the script may not print out 50 items.

1 comment: