March 5, 2010

Python – in the rough

Posted in Uncategorized tagged , , at 11:58 pm by dgcombs

I’ve noticed over the last few years that the computer scripting language Python is getting more than a little attention from everyone. I first ran into Python while trying to figure out why my Gentoo Linux installs were not updating like they should. I discovered the program doing the updating, emerge, was written in Python. When I asked on the Gentoo forums why Python was chosen for this crucial task, I got flamed into oblivion. As a result, I’ve stayed at arm’s length from Python for many years now. But recently, I’ve seen it crop up in more than one project and I decided to take it for another spin. Let’s let bygones be bygones, eh?

You can’t learn a lot about a language by the typical tutorial designed to teach you to write:

print “Hello, World!”

So I decided to do something that would be useful. Something beneficial. Something with deeper meaning. I decided to write a program call Twecho. It is a Twitter robot that sits there and echos everything that is sent to it. Of course Twecho is already taken, but apparently not being used. So I created my own echo user for testing. And I went the extra mile and made the program generic enough that you could use it with any publicly accessible Twitter account.

Twitter is accessed through an application programming interface, known as an API among programming types. This API is based on HTTP and allows you to access its workings using carefully crafted URL’s. This is a lotta work. Fortunately for me, a fellow named dewitt built a Python interface for the Twitter API. That made it easier to use the Twitter API and learn a bit about Python in the process.

The first part of the program I stole borrowed from Mr. DeWitt. It is the standard piece of initiating things and grabbing the command line used to invoke the program. Command line? Oh, that’s a big ugly screen with no user interface but a keyboard. Scripts run best from the command line.

One thing I discovered right off the bat about Python was that it is indentation aware. That is, it uses no semi-colons to end statements. Furthermore it doesn’t wrap sections of code in curly brackets. All my years of training in C, Perl and Java went by the wayside. I also found out that the program reads from top to bottom and stores away all the function calls. Then at the end you execute a special function called main() to start the ball rolling. This is a lot like other scripting languages such as Perl or even VBScript!

Once I had the userid and password, logging into Twitter using a command like this one was a snap:

api = twitter.Api(username=username, password=password)

Now I had to figure out which Twitter message to reply to! Each Twitter message has a unique ID associated with it. All I had to do was figure out the last one the program had seen. So the next step was to pull all the @messages for the logged-on user. I did this using the following command.

status = api.GetReplies()

Ah, but what if this was a brand new user and no one had ever sent an @message to him/her? In that case, the program pulls the last 20 messages from the public Twitter timeline and figures out which is the highest number.

status = api.GetPublicTimeline()
for reply in status:
if (reply.id > since_id):
since_id = reply.id
Now this is interesting. See that colon at the end of the line beginning with for? Well, apparently that little colon tells Python that the next line should be indented. If you leave off the colon, it gets very confused. And if you don’t indent the next line, it gets very confused. I just know. That’s how.
Once the program has the highest number it has responded to (in the variable since_id) or the highest number of Tweets in the public timeline, it goes into a little loop. I start off the loop with while.

while true:
status = api.GetReplies(since_id=since_id)
time.sleep(10) #pause 10 seconds to be nice
if (len(status) > 0):
for reply in status:
#message on the console so it doesn't look like we're hung
print "Posting reply to %s for ID#%s" % (reply.user.screen_name, reply.id)
reply_text = re.sub("@"+username,"@"+reply.user.screen_name,reply.text)
post = api.PostUpdate(status=reply_text, in_reply_to_status_id=reply.id)
since_id = reply.id

The value true is set above to 1. In Python, everything is true except zero and empty things. They’re false. For those keeping track, this is a Boolean concept named after George Boole, the guy that invented the truth one Saturday afternoon when his father, switch in hand, asked who’d been sneaking candy out of the family store. Apparently it was his brother, Paul.
If the the array of replies, status, has a length of zero, we can ignore it and go back to the top. However, if it is more than zero, someone has sent a message! The print statement is there to leave an audit trail on the command line screen (and so it won’t be so ugly and forlorn). Then the since_id is updated with this reply so that the loop will remember it as the last one. We don’t want to answer twice… or more.
The line that defines reply_text is interesting too. You can see the program uses something called “re” (which stands for regular expressions). These are expressions of which even George Boole’s father would approve. They are a way of defining a pattern in text. In this case, I’m using a single routine called sub which stands for ‘substitute’. This takes out the receiver’s Twitter id and puts in the sender’s Twitter id. Why’s that important you ask? Well, because it cost me about two hours of head scratching and frustrated finger poking at the screen. I even read the source code for the Python interface to the Twitter API. But that didn’t help. So then I dug a little deeper and went to the Twitter API document itself. There, it plain letters it says
Usage Note:

Twitter will ignore attempts to perform a duplicate update. With each update attempt, the application compares the update text with the authenticating user’s last successful update, and ignores any attempts that would result in duplication. Therefore, a user cannot submit the same status twice in a row. The status element in the response will return the id from the previously successful update if a duplicate has been silently ignored.

I was a little more than put out when I saw this. If they’d been less silent I might have saved some time. So I added the regular expression substitution and it started working!

One more thing about those indentations. While troubleshooting the problem with posting a duplicate message, I tried to comment out lines by placing the hash sign “#” in front of them. Turning them into comments worked just fine. However, since the next line was indented, Python freaked right out and shook its finger at me. So not only do you have to be aware of where your colons are, but where they’re not.

As I said at the beginning of this, it’s possible to learn a little about a language by getting Hello, World! to print on a screen. But when it all goes well, it’s not nearly so instructive as when things go right down the tubes and you have to dig your way out. That’s when you learn the most.


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: