By default, Python cannot deal with non-ASCII characters in source files, such as the following line:

re.sub("ae", u'ä', "haette")
# SyntaxError: Non-ASCII character '\xc3' in file <file>.py on line <line>, but no 
# encoding declared; see http://legacy.python.org/dev/peps/pep-0263/ for details

The file encoding can be signalized by adding a magic comment in either the first or second line of the file:

#!/usr/bin/bash
# -*- coding: utf-8 -*

 References

  • [1] More information in the Python Enhancement Proposals index

The update from Ubuntu 13.04 to 13.10 failed for me with the error message:

Your python install is corrupted. Please fix the /usr/bin/python symlink.

The reason was that I used update-alternatives to manage my two Python versions (2.7 and 3.3). The following solved the issue for me:

sudo update-alternatives --remove-all python
sudo ln -sf /usr/bin/python2.7 /usr/bin/python

(The expected version can be found out via: python /usr/share/python/pyversions.py -i)

References

  • [1] One of many forum entries that describes this solution

Flattening a list refers to the procedure of unpacking nested lists this list. This means, if we have a list [“a”, [“ba”, “bb”], “c”], then the flattened version of this list would look like: [“a”, “ba”, “bb”, “c”]. The flatten operation can be implemented in several different ways.

All of the following code snippets have been tested with Python 3.3.

Our running example is the following list:

list_of_lists = ["a", ["ba", "bb"], ["ca"]]

itertools.chain

The least verbose one is to use chain from the itertools module (since 2.6):

import itertools
# works on a number of lists that are given as separate parameters
l1 = list(itertools.chain(*list_of_lists))
# works on a single "2D" list
l2 = list(itertools.chain.from_iterable(list_of_lists))

Clearly, this should be the preferred way of implementing flatten in Python, because that is what it actually does – in one, short line.

reduce

Compared to this, the following code looks rather hacky. Note that reduce was a built-in function in earlier versions of Python.

import functools
l3 = functools.reduce(lambda x,y: x + y if isinstance(y, list) else x + [y], list_of_lists, [])

sum

If all elements of list_of_list were lists, then the abbreviated form of this is

l4 = sum([["a"], ["ba", "bb"], ["ca"]],[])

Running times

There are some quite fine-grained timing analyses available in the Stackoverflow questions that deal with this question, for instance by cdleary.

 

References

  • [1] (one out of many) stackoverflow question about this problem

Continuous testing makes developing much simpler. If you have a set of unit tests, a continuous testing framework runs these tests in the background whenever you change a source file.

For Python, a variety of tools exists such as sniffer or tdaemon. Note: The following code has been tested for Python 3.3 and Python 2.7. There will be some extra effort if you want to use Python 3.3 with tdaemon of the version I used (as of Aug 22, 2013).

Quick Summary

  1. Run tdaemon to continuously watch for changes:
    tdaemon --custom-args="--with-notify --no-start-message --all-modules"
  2. Implement functionality in the class under test
  3. Implement unit tests with unittest.TestCase
  4. Re-iterate through steps 2.  and 3., going from green to green (refactoring), green to red (new test) or red to green (new functionality)

Class under test

For our setting, I created a small class that represents a triangle. We want to test the method area that returns the area of the triangle using Heron’s formula. The following is the content of the file Triangle.py.

class Triangle:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def area(self):
        s = 0.5 * (self.a + self.b + self.c)
        area = math.sqrt(s * (s - self.a) * (s - self.b) * (s - self.c))
        return area

Unit tests

The unit tests are created with the unittest framework that normally comes shipped with Python. We subclass unittest.TestCase and declare each test as one method that contains the string test or Test in its method name.

from Triangle import *
import unittest

class TriangleTest(unittest.TestCase):
    def test_area1(self): # right triangle with legs 3,4 and hypotenuse 5
        a, b, c = [3, 4, 5]
        expected = 6
        self.assertEqual(Triangle(a,b,c).area(), expected)
    def test_area2(self): # equilateral triangle with side length 3
      a, b, c = [3, 3, 3]
      expected = 3.8971143
      self.assertAlmostEqual(Triangle(a,b,c).area(), expected, places=5) 
if __name__ == '__main__':
     suite = unittest.TestLoader().loadTestsFromTestCase(TriangleTest)
     unittest.TextTestRunner(verbosity=2).run(suite)

The last lines configure the test suite and set the verbosity level to 2. From the command line, we can run the unit tests manually, already: python TriangleTest.py and get feedback. Documentation on the various assert methods can be found here

Continuous Testing

The motivation of this post is that we do not want to run our tests manually, each time we modify the code. Instead we let tdaemon monitor our files and decide when to run the tests.

For this we need two python packages: nose and tdaemon. Both of them can be installed easily through pip. In order to get pip on Ubuntu, you need to execute one of the following:

sudo apt-get install python-pip # this resolves to Python 2.7 on my system
sudo apt-get install python3-pip # this resolves to Python 3.3 on my system

Afterwards, install nose and tdaemon:

sudo pip install nose tdaemon # for 2.7
sudo pip3 install nose tdaemon # for 3.3

Depending on your flavor of Ubuntu, you may also get “system tray” notifications from tdaemon if you install pynotify:

sudo pip install nose-notify
sudo pip3 install nose-notify

Fixes for Python 3

In my version of tdaemon, there were several issues related to changes from Python 2 to Python 3. All of them occurred in /usr/local/lib/python3.3/dist-packages/tdaemon.py.

  • All print statements need parantheses.
  • In line 263, we should leave out msg and write: except Exception as e, adapting the print statement below to print(e).
  • The import of commands is deprecated, we should use subprocess instead. We replace commands.getoutput with subprocess.getoutput
  • Around line 175, there is a call to hashlib.sha224. We have to explicitly encode the string content as UTF-8: hashcode = hashlib.sha224(content.encode(‘utf-8’)).hexdigest()
  • The method raw_input has been renamed to input in Python 3. We need to change the single occurrence of raw_input around line 46 to input in the file.

I came across an encoding problem that occurred rather infrequently:

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x… in position ….: invalid start byte

Caused by the opening of  a file around line 175: content = open(full_path).read(). As I could not figure out, what went wrong exactly, I decided to add the option errors=”ignore” in order to ignore this problem.

Running tdaemon

Once everything is set up, you need to go into the folder containing your implementation and test module and type:

tdaemon --custom-args="--with-notify --no-start-message --all-modules"

This tells tdaemon to run nose (default) with system notifications (system tray) enabled and with auto-discovery of all modules it can find. If you find that system notifications do not work for you, leave out the option –with-notify.

Afterwards, tdaemon presents the command it will run to you and you need to confirm with y

WARNING!!!
You are about to run the following command
   $ nosetests . –with-notify –no-start-message –all-modules
Are you sure you still want to proceed [y/N]?

The first change you make to any of the affected files will trigger nose:

2013-08-22 09:36:58.409211
..
———————————————————————-
Ran 2 tests in 0.008s
OK

You can quit tdaemon‘s execution with Ctrl-C.

References

  • [1] blog entry that describes the setup for several platforms (not just Linux)
  • [2] tdaemon project site
  • [3] unittest module reference

Tokenization is the process of splitting a string into smaller (elementary) units that are further processed afterwards. For the task of preprocessing arithmetic expressions such as 1+2*(3-4) there is no need for a sophisticated tokenizer. Instead, we can use re.findall:

import re
expr = "1+2*(3-4) / 7 - 29"
tokens = re.findall(r'[0-9]+|[\(\)\+\-\*/]', expr)
# tokens = ['1', '+', '2', '*', '(', '3', '-', '4', ')', '/', '7', '-', '29']

The function also already tackles whitespaces and ignores them correctly. Note that the expression only identifies integers (no floating numbers) correctly.

References

  • [1] re module reference

For its GPS-enabled camera TZ-31, Panasonic offers to download additional geographic data from the accompanying DVD. Unfortunately, the software for copying the data to the SD card is for Windows only. When I examined the content of the DVD and the SD card after a transfer of European data, I recognized that there is nothing much to implement in order to have a similar platform independent tool.

Note: As pointed out in a comment, my script does not seem to be compatible with the new TZ40, because the directory layout for this camera seems to differ from the TZ31.

How to use it

I host the code of this mini-project on GitHub (download it here) All you need to do is

  1. Download maptool.py and MapList.dat.
  2. Open up a command-line and run the tool. For example:
    python mapdata.py --regions="6;7;8" --mapdata="/media/dvd/MAP_DATA" --sdcard="/media/sdcard"
  3. Important: The tool will delete the subdirectory PRIVATE/MAP_DATA on the SD card, thereby removing any existing maps.

You get more information about the program options when you start the tool with option –help.

In order to adapt the above command for your setting, you should insert the DVD shipped with your TZ31 and plug in the SD card of your camera.

  • –regions: each global region is assigned an integer between 1 and 10. If you install the official MapTool, then you can browse the regions from within the program or by opening the HTML files which are stored in the program files of the MapTool. As this tool here is meant to work independently from Windows, I extracted the region ids for you (see also –help):
    1 – Japan
    2 – South Asia, Southeast Asia
    3 – Oceania
    4 – North America, Central America
    5 – South America
    6 – Northern Europe
    7 – Eastern Europe
    8 – Western Europe
    9 – West Asia, Africa
    10 – Russia, North Asia
  • –mapdata: the path to you map data which is normally stored in the subdirectory MAP_DATA of the DVD
  • –sdcard: the path to the “root” of your SD card

Links

  • [1] Downloadable code on GitHub

I recently needed to programmatically modify some Tomcat context files related to Solr which are encoded as XML. As python is quick to use for scripting while being more comfortable than the Bash for such tasks, I decided to use the module xml.dom.minidom, which is shipped with every python distribution.

In the following, I describe some use cases for working with XML using minidom. I will use the following example file /tmp/test.xml:

<?xml version="1.0" ?>
<Context docBase="changeme">
   <Environment name="solr/home" value="changeme"/>
   <Environment name="other_env_var" value="123"/>
</Context>

If we want to edit an existing XML file, we first need to read this file:

import xml.dom.minidom 
dom = xml.dom.minidom.parse("/tmp/test_template.xml")

Afterwards, I fetch the Context element and check for the attribute docBase. I do not check whether the document contains a Context element, as I always use the same template file to work with:

context_element = dom.getElementsByTagName("Context")[0] # We assume there exists exactly one Context element 
context_element.setAttribute("docBase", "/srv/tomcat/webapps/solr.war") # Update of attribute docBase
print(context_element.getAttribute("docBase"))

In the previous case we assumed that the Context element we (hopefully) got back from getElementsByName was the right one. What if we have mulitple elements and want to select a specific one? In the following example, we want to select a certain Environment element inside Context which has an attribute of name name with a value solr/home. We use the powerful filter function for finding the correct element. The filter function accepts a list of elements and a function which maps from the element type to True/False, thereby deciding whether an element shall be in the output or not.

env_elements = dom.getElementsByTagName("Environment")
home_element = filter(lambda e: e.getAttribute("name") == "solr/home", env_elements)[0] # Again, the check for existence of the home_element is omitted here
home_element.setAttribute("value","/src/solr/")
print(home_element.getAttribute("value")

Finally, we need to write out the file

dom.writexml(open("/tmp/test.xml", "w"))

or we may inspect the generated XML code before writing it to file:

print(dom.toxml())

Links

Directories and Environment

Note: The import statements are repeated for every code sample just to show you what modules you need. In real operation, of course, the modules need only be imported once.

Check whether a file/directory exists:

import os
print("File /tmp/test2 exists? " + str(os.path.exists("/tmp/test2")==False))

Get the path to the current file (with the help of the magic variable __file__)

import os
os.path.abspath(__file__)

Determine directory of a file (similar to dirname Bash command):

import os
directory = os.path.dirname("/tmp/test/testfile.txt")
print(directory) # yields /tmp/test

Copy a directory recursively (contents of “/tmp/test1/”  to “/tmp/test2″):

import shutil
shutil.copytree("/tmp/test1", "/tmp/test2")

Remove a directory recursively:

import shutil
shutil.rmtree("/tmp/test2")
# The above call will produce an error if the directory exists, the following command ignores errors:
shutil.rmtree("/tmp/test2", ignore_errors = True)

Export the following mapping to the environmentCSX_HOME=/home/user/citeseerx:

import os
os.environ["CSX_HOME"]="/home/user/citeseerx"
print(os.environ["CSX_HOME"])

Change the current working directory (CWD) temporarily and then return to the original working directory:

import os
old_directory=os.getcwd()
# Switch to new working directory
os.chdir("/tmp")
# do something, e.g.
print(os.getcwd()) # yields /tmp
# And switch back to the original wd
os.chdir(old_directory)

Create a directory  (cf. mkdir -p):

os.makedirs("/tmp/newdir")
os.makedirs("/tmp/newdir", <tt>0744</tt>) # with unix-like permissions (r:4, w:2, x:1)

Files

Write a raw string to the file /tmp/testfile.txt:

open("/tmp/testfile.txt", 'w').write("Hello World, this is my file contentn")
# verify content e.g. with Bash command: cat /tmp/testfile.txt

Read file to string:

content = open("/tmp/testfile.txt", 'r').read()
print(content) # yields "Hello World, this is my file content" for the previous example

Copy file to directory:

import shutil
shutil.copy("/tmp/testfile1.txt", "/tmp/testdir")

Renaming a file:

import os
os.rename("a.txt", "b.txt")

Process Management

Exit with a certain exit code (here: 1):

import sys
sys.exit(1)

Start a subprocess (e.g. a Bash script):

import subprocess
subprocess.call("/bin/ls") # will list the files in the current directory

Links