January 05, 2009

Enthought

Intro to Scientific Computing in Python, Feb. 16-20

Enthought is offering “Introduction to Scientific Computing in Python” at our offices in Austin, Texas from February 16th to February 20th. This course is intended for scientists and engineers who want to learn to use Python for day-to-day computational tasks.

  • Day 1: Introduction to the Python Language
  • Day 2: Array Calculations with NumPy
  • Day 3: Numeric Algorithms with SciPy
  • Day 4: Interfacing Python with Other Languages
  • Day 5: Interactive 2D Visualization with Chaco

The cost for the course is $2500. Please see the course description on the Enthought website for details.

by swisher at January 05, 2009 10:48 PM

Titus Brown

Dear Lazyweb: Write-Once/Few-Read-Many Options for Python?

The decision of python-dev to deprecate bsddb has left us in a bit of a pickle (hah!) over in the pygr project. We're looking for a replacement for bsddb for default storage of infrequently- (or never-) changed pickled Python objects. Some of the parameters under consideration are:

  • Python version availability: does it work for 2.2 on up? What about py3k?
  • cross-platform availability: is it readily available for Mac OS X and Windows, no compilation required? Byte-order compatible across platforms? Comes with Python by default is a plus...
  • scalability: can it scale to gigabytes or 10s of gb of data? 10s of millions of records?
  • is it fast, whatever that means?
  • is it simple to set up: no sysadminning required?

We're looking at sqlite and python-cdb right now, as well as a home-grown solution. What have we missed?

So far Istvan Albert has benchmarked bsddb hashopen and btopen, sqlite, GNU dbm, and python-cdb; you can see the results here. The whole discussion thread on pygr-dev may be worth reading if you're interested.

A few additional notes --

  • Couchdb, MySQL, PostgreSQL, etc. violate the "no sysadminning required" rule. We will probably support them, but they will not be the default.
  • python-cdb looks blazingly fast, but we would have to port it to Windows, and make binaries available. What's the scoop on python-cdb, anyway? Is it well maintained, a well-used project, etc?
  • sqlite isn't "built in" prior to Python 2.4.

Anyway, at this point I'm just trying to figure out what we're missing, if anything!

thanks, --titus

January 05, 2009 08:03 AM

December 30, 2008

Gaël Varoquaux

GPS coordinates for the world’s major cities

I know this sounds stupid, but I have failed to find on the web a simply-accessible text file giving the GPS coordinates for the world’s major cities. The web site World-gazetteer is helpful, but doesn’t really give what I want. So I wrote a small Python web-scraping script to get the data I needed. BeautifulSoup is really cool, but somehow I feel I could be spending my time in a better way. And on a side note, I still struggle to get good conversion from unicode to ascii (I needed ascii, as this will be fed to software that don’t understand unicode).

I am attaching both the data file produced, and the Python script, in case someone finds them useful.

Update: About 10 minutes after posting, I did a better google search, and found a page that might have been easier to scrape: http://www.bcca.org/misc/qiblih/latlong_oc.html

by gael at December 30, 2008 11:53 PM

December 29, 2008

Titus Brown

Concepts in Database-backed Web Programming - a Post-Mortem

(This blog post is a long, rambling retrospective on my recent undergrad comp-sci course at Michigan State U., newly renamed to "Concepts in Database-backed Web Programming".)

I set out this term to teach a CS class in the way I would have wanted it taught when I was an undergrad. The class was CSE 491: Introduction to Database Backed Web Development, the place was Michigan State University, and the audience for the class was a group of 30-odd undergraduates, with a few graduate students mixed in. The range of prior experience with programming, Python, HTML, JavaScript, SQL, etc. was pretty broad: some students were crackerjack PHP Web programmers; others had never seen Python and had no experience with Web programming whatsoever.

You can read about how I pitched the class on the course Web site, but what I planned to do was throw the students into the deep end by introducing them to a range of new technologies and approaches, all in the context of developing a database-backed Web site. Michael Carter (the Orbited guy) convinced me at PyCon '08 that I should teach the technology more or less from scratch, by taking them through network programming, building their own Web server, and writing a Web app. So this is what I did.

I was in a bit of an odd situation, having been hired for a Computer Science faculty position with no degree in CS and scarcely any CS background. (My BA is in Math and my PhD is in Developmental Biology.) Why was I hired, you might ask?? Well, I've been working as a computational scientist for over 15 years, I'm firmly embedded in the open source world, I luuuuve agility (note, small a), and I do a certain amount of bioinformatics research, so I managed to convince the CS department that (in addition to my amazingly buzzwordy and sexy research interests) I could bring some useful skills to the CS department. It didn't hurt that MSU had already decided to try out Python for its intro programming course for majors, either.

Now, as any programming wonk knows, us actual programmers know how to teach programming far better than those ivory tower academics do. And since I regard programming as necessary, if not sufficient, for a Computer Science undergrad degree, I decided to focus on teaching programming. So I discarded my ivory tower academician status & set out to do a better job of teaching it. (That's only partly sarcastic, BTW; I don't know many CS professors who think we do a stellar job of teaching programming. Of course, I'm equally sure that people like Yegge and Spolsky don't have the answers either!)

My goals were simple, if not easy: teach students about programming a real app, while introducing modern methods and iterative development. If they happened to learn something about Web programming on the way, so much the better.

Now, my plan was to discard as much unnecessary cruft as possible and teach the young 'uns how to actually program, in some modern useful language, while throwing as many new concepts as possible at them and encouraging them to scavenge online. Specifically,

  • no prior Python experience was required;
  • Subversion use was mandatory;
  • homework could be collaboratively done, or not: all I cared was that the students could explain their work upon request;
  • code could be taken from anywhere and anyone, as long as students understood the code they'd used;
  • the class was programming-heavy, with weekly assignments;
  • attendance was left to the students, i.e. no mandatory attendance;
  • all my notes were (and are) open;
  • my notes were intentionally incomplete, with the idea that students should have to use other resources (e.g. the Web) to find answers;
  • my primary concern was that their code run to my specs, and I provided automated tests for that purpose;
  • the central class project was a continuing one, and I would dock points for things that hadn't been fixed from the last homework;

I should mention that this was my first solo course, and I was developing it from scratch, while setting up a new lab and research program, moving into a new house, and helping take care of my 1 yro daughter. So it was a tad overambitious to develop a new course, with a new approach, as my first ever professorial experience. (The next time so many people tell me I'm being insane, I may listen.)

The bottom line is I think I did an OK job, with some notable failures, and (time will tell) hopefully some successes. Since I regard failure as "good", in that mindset, I offer the following retrospective. All of these issues below were surprising to me. Some, in retrospect, should have been obvious & were the results of insufficient planning; others were simply surprising and I don't think I could have anticipated them. And then there's the issues the students themselves had to point out to me in the end-of-class survey...

  1. There was very little duplicate homework, as far as I could tell.

One surprise was that almost everyone handed in homework they wrote themselves (or, their attempts at obfuscating their duplication worked :). Since I didn't require independent work I was expecting more duplication.

Occasionally I got two people who had made exactly the same mistake, because they were working together. But their logic was similar, not their code.

  1. People rarely took advantage of last week's solutions.

Students really wanted to do the work themselves, and even when I handed out solutions to the previous homework they built on their own code almost exclusively.

This also led to some entertaining situations where people programmed themselves into a corner, but that's called "gaining valuable experience", right?

  1. Students don't like to ask for help.

My homework assignments varied widely in their difficulty, but students rarely asked for help, especially towards the end of the course. I like to think this is because they got better at doing the work but I was told that "they figured me out", which seems to mean that they got used to the way I asked questions and what kinds of things I wanted them to do.

  1. Students generally do all their work at the last minute.

The majority of students almost invariably did the homework the night before. This meant that when I handed out a more difficult or time-consuming homework, there were more complaints and less good work than when I handed out easy homework -- even when I told them that it was going to be a lot of work this week. This was intensely frustrating to me, but completely understandable.

  1. Even though I broke down the problems, they were still big.

If you look through the homeworks, you'll see that I broke down the problems into pretty manageable chunks. (One of my friends taking the course pointed out that all the hard work of abstraction was already done in the assignments!) At least at first, students had trouble digesting these chunks.

I trace this difficulty back to two root causes: first, I did not always hand out self-contained homework assignments, so students had to put in a bit of work to figure out extra components were needed. This is in contrast to what I think is the typical undergrad CS assignment, where the problem domain is completely covered in class and if you're smart (and the professor doesn't make mistakes!) you can do the HW without going beyond the handouts and the book. Second, students are relatively unused to pure programming assignments: most classes ask for something conceptually difficult, and I don't think much of the HW for this class was difficult. In contrast, I assigned a fair bit of programming and problem-solving and I got the impression that students -- especially the less experienced ones -- had trouble with sizeable programming problems.

  1. Writing consistent homework assignments is tough.

My homework assignments were all over the map in difficulty, for reasons that seem clear in hindsight but were not so obvious when I was writing up the syllabus!

  1. Students improved dramatically over the term.

I started out with what I thought were simple assignments, and students said they were too hard (and did poorly on them). I lowered the level a bit, which left them them challenging and significant but not as difficult. Towards the end of the term, though, most students started acing every homework; they'd actually learned something during the term, you see... and I didn't adjust!

I'm pretty sure about half the class was bored for most of the second part of the term.

  1. Properly introducing automated testing in a course is hard.

I wanted one of the big novelties in this course to be automated testing. Students aren't formally exposed to any kind of automated testing in the normal CSE curriculum, and since I'm test-infected I thought it'd be fun to introduce them to unit & functional tests.

I think I did an OK job with the functional tests. We used twill to test the basic Web stuff, and Selenium to test some of the JavaScript; I didn't introduce the Selenium IDE until the last lab, for some reason, but other than that I think the students got the idea at least.

Unit tests were much more problematic, though. I couldn't figure out how to require them to write unit tests on their own, and looking back on the term I still don't know how I'd have done it. So what I did was supply unit tests as part of the homework. The problem then was that they regarded the unit tests as a metric to pass, and rather than coding to the intent they coded to the letter of the tests. It got to the point where I had to choose between becoming a complete test fascist and really specifying every jot & tittle in a test, OR giving up and using the unit tests as a guide rather than the guide. I chose the latter, and that's when I started pushing more on functional tests, arguing to myself that we really cared about the functionality...

I'll probably write more about this later, but I've come to the reluctant conclusion that it's hard to teach real unit testing to people who are relatively new to programming. They simply don't have the paranoid mindset that they need to have in order to properly write tests. This may have implications for poorer-quality programmers, too.

  1. Subversion was a disaster (but it was probably my fault).

By the end of the course, there were three or four people who had irretrievably messed up their Subversion working repository, and a dozen more who were still having problems periodically.

This was because I'm an idiot.

I made students hand stuff in through per-student private svn repositories: homework #1 would go in homework1/, homework #2 in homework2/, etc. However, homeworks 4 through 13 were all related -- building a Web server -- and more and more files became reused from HW to HW as the course went on. So, students would drag & drop subfolders from one HW to another HW... contaminating their next HW with .svn directories from the previous homework. This basically wedged svn.

I don't know what I was thinking.

What I should have done was have them all work off of a trunk and then use svn copy to make branches for each handed-in HW. This is what I'll do next year.

However, I have to say that TortoiseSVN shares some of the blame. It does not interoperate well with command-line svn, so students who were primarily using Windows to work with their homework could not switch to using command-line svn. Grr.

  1. Making the homework gradeable is hard.

It turns out it's not enough to assign homework that's targetted at an appropriate problem and at an appropriate level. Someone also has to grade it! And with 30-odd students, at 10 minutes a homework you're going to need 5 hours to grade it... so it behooves you to make homeworks that are easy to test at a functional level!

I started out by writing automated scripts to run through and test various behavior. This was great when everything worked, but sucked for broken homework -- that homework had to be graded individually.

My first grading innovation was to print out each homework assignment and look at it. I could usually scan for common mistakes and mark points of interest within 30 seconds; once I'd finished scanning all the printouts, I could run automated tests to verify that everything worked, and when it didn't, my visual scan had usually given me a pretty good idea of what was wrong. Printing out the homework worked until the homework got too long -- by the end, the average homework was well over 400 lines of code.

My next grading innovation was to write a fairly thorough set of functional tests at the Web server level, using both twill and Selenium. This worked great once the basic networking functionality was complete, and also highlighted a number of places where my unit tests had passed the homework but the HW was broken at a higher level.

Towards the end, though, I simply had to run their Web server and test it in my own Web browser. Since I had introduced Selenium by around week 10, I could require that they hand in Selenium tests, and I could also swap in my own Selenium tests for automated testing purposes. It still took hours and hours and hours, but it was easier.

Next year I'll probably include more specific hooks in each homework -- "the file must be named this, must be importable, and must have a function called x that does y and z". I'll also introduce functional testing at an even earlier stage.

Overall, though, grading homework was a huge time suck. I'm going to have to think about how to write more gradeable HW assignments next year. Of course, next year my TA will be doing the first pass grading...

11. If you only care that the code works, the students will hand in crappy code that works.

Any student of metrics will tell you that if you measure people's performance in a few specific ways, they will game the metric.

In general, students didn't pay much attention to code clarity and maintainability. A colleague of mind has named this typical CS HW mentality the "dung beetle" model: students pile on more and more code until it works, and then repeat the next week. This had the expected effect of tripping them up at various times, because they were building on their previous HW! But it still got ugly.

My favorite example of bad code was what students handed in when I was trying to get them to call sock.recv in a loop, until they received the specified number of bytes for a POST. Reading just the right amount here is critical, because their servers were blocking servers, and so sock.recv would block once the client was sending no more. To require the correct behavior, I wrote a stub socket object that raised an Exception (rather than blocking), used dependency injection to insert it into their code, and then checked for the right behavior.

The right code was something like this:

while remaining > 0:
   data += sock.recv(expected)
   remaining -= len(data)

Now, many people had a fencepost error,

while remaining >= 0:
   data += sock.recv(expected)
   remaining -= len(data)

and when I added the stub & exception, their code morphed overnight into this:

while remaining >= 0:
   try:
      data += sock.recv(expected)
   except:
      break
   remaining -= len(data)

This code passed my tests by waiting for the assertion raised by my dummy sock.recv, and also worked fine for small POST requests -- but it hung indefinitely on real data. They'd seen the exception being raised by sock.recv and rather than trying to understand the root cause, they'd just ... handled it!

Bleargh.

Still, there were a number of students who seemed to just understand that they should go through their code a few times and clean it up. Reading their code was always a pleasure.

  1. Preparing properly for a course is a huge amount of work.

I truly didn't realize how much work it was creating a lecture outline, writing up lectures notes, creating a lab, writing up lab notes, creating homework, and grading homework. Good god. I must have spent 20-30 hrs a week on the course, and the quality of my work was not always high; next year I might spend as much again, just getting everything up to snuff. After that I'll just have to tweak it, thank goodness.

  1. We don't teach simple troubleshooting logic well at MSU.

Students simply don't know how to troubleshoot code -- it turns out it's not an intuitive skill. Who knew?

The number of times I made students go back and put in print statements to figure out exactly what was going on ... <shakes head>

I will devote a lab or two to basic debugging and troubleshooting skills next year.

  1. The language doesn't much matter (tho Python still rocks).

About half the class started out with no Python background, but by the fourth homework, everyone could use Python equally well. The main differences that emerged were between people who expressed themselves clearly in their code, and those who didn't. Python helped primarily by eliminating a lot of the syntactic cruft that would have confused matters.

  1. Knowing a subject well doesn't mean you can teach it well.

I know basic network programming as well as most anyone, and I certainly know my Python. But explaining my programming and design choices to students turned out to be quite difficult. I knew all sorts of reasons why the statelessness of the basic HTTP protocol was really great for programmers, but I sure couldn't expound on them in front of a class of people.

Being an expert is different from knowing how to teach the material! And standing in front of a chalkboard lowers your IQ by at least 20 points.

---

So I have a lot of food for thought. In addition to the systemic improvements, for next year, I'm probably going to bump up the level of the course. I'd like to make it more of a "project" course; I'm thinking of something like a three-way division:

  • 5 weeks of Python, network programming, HTTP, and building a Web

    server.

  • 5 weeks of SQL, JavaScript, CSS, AJAX.

  • 5 weeks of bashing on a real site, with a real framework.

--titus

December 29, 2008 10:03 PM

December 24, 2008

Titus Brown

Lazyweb results: code review and git.

Here's a summary of the e-mail responses I got to my lazyweb query re code review and git.

A few people (Paul Nasrat and Jeff Balogh) pointed out that Review Board supports git. So I may try that.

Charles McCreary has had good experience with Rietveld, but I don't think that it has mature git integration -- although git-cl looks interesting. However, from what I understand git-cl works in tandem with svn, letting you associate git patches with particular Rietveld issues and then commit them to svn. Since we're eschewing svn completely, that might not work well.

Jeff Balogh also discussed Gerrit a bit, and suggested I try it; it's an adaptation of Rietveld to support git.

Jeff also pointed me towards some great resources: Alex Martelli's Code Reviews for Fun and Profit and some launchpad (?) discussion on a mailing list about how to make code reviews work better.

Jeff's other words of wisdom for code reviews:

  • ask lots of questions, but be prepared for blowback from people that think you're being a jerk.
  • get as many people involved in code reviews as possible, as soon as possible, or else you might get stuck being the "code review" person.

Finally, Andriy Khavryuchenko pointed me towards his new startuplet, reviework.com -- blog post here. This is a "code discussion" service that seems like it would integrate well with a lightweight git-based code review approach. I may bug him about an opportunity to beta test reviework once I actually have some experience with code review...

Thanks, everyone! I'll be sure to post about how the code reviews go.

--titus

December 24, 2008 06:43 PM

December 23, 2008

Gaël Varoquaux

Tracking objects in scientific code

When I started working in my new field (data analysis of functional brain images), I was surprised to find in our data-analysis scripts what I thought was a very particular code smell: the numerical code is always doing a lot of filename and path manipulation, loading and saving data even in the core routines. I couldn’t picture what seemed wrong with this, but I was uncomfortable with it.

The good

Memory management

In the data-processing work I am currently doing, we deal with large objects, mostly huge numpy arrays, though sometimes some domain-specific data containers creep in. As a result, simple calculations take time (an SVD is 10 minutes), and I am always fighting with memory.

Saving to disk is a handy way of freeing memory. Moreover, with memmapping, reading only the relevant parts of pre-calculated arrays becomes very cheap.

Crash-resistance

When the simplest operation takes ten minutes, you want to save intermediate steps, to be able to resume calculations, or to inspect why the code crashed. And who knows, you might need this intermediate step.

The bad

The immediate apparent problem is that your code becomes riddled with path-management code. We often joke that once we have figured out the algorithm, the longest surviving piece of code is the path-related junk.

But, I believe this is only the tip of the iceberg, and that this code smell hints to deeper problems.

The ugly

Loss of scoping

When I started working on these problems, I was startled to encounter basic domain-specific algorithmic functions taking input and output data filenames. It took me a while to realize that the huge problem with this is that I loose scoping, or in other words naming locality. Let us pretend that I have a function ‘foo’ that does basic numerics on large numpy arrays, but to save memory it takes as a signature the name of the file where the input array is stored, and the name of the filename where
the output array should be stored. So I have some code that looks like this:

    def process_sessions(session_files):
        for session_file in session_files:
            foo(session_file, session_file + '.out')

Saving to files in the loop is a huge gain of memory;

Now I decide I want to add a parameter to foo, and vary this parameter, with, eg:

    for param in params:
        process_sessions(session_files, param)

My code is hard to refactor, because I need to introduce modifications deep in all subroutines to make sure they do not save their outputs in the same file.

Suppose session_files are actually extracted from an upstream dataset, and now I want to apply my algorithm on a set of these upstream datasets, and in parallel. Once again I need to generate a score of new filenames and keep track of them. I can use temporary files, but I need to keep hold of this information too, and I loose most of my crash-resistance.

When you think it over, the way programming languages solve this problem elegantly, is by the rules connecting names to objects, and in particular scoping: a name corresponds to an object in a given function. Using files is equivalent to using globals, and we have to cook up our own scoping rules (which results in a lot of path-massaging code).

No history tracking

When I find a file on the disk, I do not really know how it has been generated. As a results, the crash-resistance is compromised. Moreover, when tweaking algorithms, we often try to rerun only the necessary parts of the algorithms, relying on the precomputed parts saved to the disk. We comment out code, or exercise different code paths. As a result we often end up in situations where the whole code does not actually run. And once again refactoring is hard, because we have not expressed the dependency relations between our intermediate results.

Doing better?

Once again, today I was refactoring my algorithm, or my “pipeline” as we call it. And once again, I felt the failure to have the proper tools, the proper abstractions, words, to express the problem in the code. Manipulating files directly seems wrong, for the reason expressed above. But can we do better?

The problem, I believe, is that we need a lightweight persistence framework adapted to scientific purposes. I remember telling Travis Vaught a few weeks before beginning my new job that scientists had no problem with their persistence. Well, I was so wrong.

By a persistence framework, I do not mean a persistence mechanism, like numpy.save, or hdf5, or a database. I am interested in the objects with which we represent it in the code. How do we solve the scoping problem? And the history problem? Can we implement a “trajectory tracking”, to reuse
the words of Alexandre Fayolle, for our data containers?

I am thinking about a small set of well-thought abstractions, a bit like the use of ORM (object relational mappers) in web application, that would take care of the mapping from in-memory objects to objects on the disk for us.

I am starting to have some ideas. I am thinking in terms of context objects, with getattr tricks to do the mapping to a database doing the bookkeeping and the trajectory tracking, and doing the impedance matching with objects stored as numpy “.npy” files, hdf5 files, nifti files, or whatever you want. The added value of a database would be that it would give some robust locking, and possible network abstraction, to allow for crash-safety, and parallel or distributed computing.

This may sound overkill, or overcomplicated. I’ve tried simple things. They all failed.

This is a problem that matters a lot to me. I feel I am loosing a lot of time on this. However I feel that the effort to do something good is quite important. I am also afraid of polluting my numerical code with
unnecessary abstractions. The main problem is that attempting to solve this problem would require a significant investment in time, and I don’t really see where I can find this time.

Have people encountered similar problems? Do you have any suggestions, any trick to share?

I’d be very happy to read any comments that can move forward my thinking, even if it is about pointing out problems and not solutions. I still think I haven’t identified the problems well.

Update: I have just realized that I will be almost without internet access for the next week, starting from pretty much now. Looks like it was a bad moment to start a thrilling discussion. I guess I got carried away by the discontent of a day doing some bad refactoring. I really look forward to catching up when I come back. Please forgive me for the bad timing.

by gael at December 23, 2008 12:26 AM

December 22, 2008

Enthought

EPD Py2.5 v4.1.30101 Release Now Available

Enthought, Inc. is very pleased to announce the newest release of the Enthought Python Distribution (EPD) Py2.5 v4.1.30101:

http://www.enthought.com/epd

EPD Installer for Mac OS X

The size of the installer has be reduced by about half. Also, this is the first release to include a 3.1.0 version of the Enthought Tool Suite (http://code.enthought.com/), featuring Mayavi 3.1.0.

Mayavi 3.1.0

This is also the first release to use Enthought’s enhanced version of setuptools, Enstaller (http://code.enthought.com/projects/enstaller/). Windows installation enhancements, matplotlib and wx issues, and menu consistency across platforms are among notable fixes.

The full release notes for this release can be found here:

https://svn.enthought.com/epd/wiki/Py25/4.1.30101/RelNotes

Many thanks to the EPD team for putting this release together, and to the community of folks who have provided all of the valuable tools bundled here.

Best Regards,

Chris

About EPD

The Enthought Python Distribution (EPD) is a “kitchen-sink-included” distribution of the Python™ Programming Language, including over 80 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, database adapters, and a lot of other tools right out of the box.

http://www.enthought.com/products/epd.php

It is currently available as an easy, single-click installer for Windows XP (x86), Mac OS X (a universal binary for Intel 10.4 and above) and RedHat EL3, EL4 and EL5 (x86 and amd64).

EPD is free for 30-day trial use and for use in degree-granting academic institutions. An annual Subscription and installation support are available for commercial use (http://www.enthought.com/products/epddownload.php) including an Enterprise Subscription with support for particular deployment environments (http://www.enthought.com/products/enterprise.php).

by ccasey at December 22, 2008 07:10 PM

December 18, 2008

Enthought

EPD Py25 v4.1.30101_beta2 Available For Testing

The Enthought Python Distribution’s (EPD) early access program website
is now hosting the beta 2 build of the upcoming EPD Py25 v4.1.301
release.  We would very much appreciate your assistance in making EPD as
stable and reliable as possible!  Please join us in our efforts by
downloading an installer for Windows, Mac OS X, or RedHat EL versions 3,
4, and 5 from the following website:

http://www.enthought.com/products/epdearlyaccess.php

The release notes for the beta2 build are available here:

https://svn.enthought.com/epd/wiki/Py25/4.1.301/Beta2

Please provide any comments, concerns, or bug reports via the EPD Trac
instance at https://svn.enthought.com/epd or via e-mail to epd-support@enthought.com.

About EPD
———
The Enthought Python Distribution (EPD) is a “kitchen-sink-included”
distribution of the Python™ Programming Language, including over 60
additional tools and libraries. The EPD bundle includes NumPy, SciPy,
IPython, 2D and 3D visualization, database adapters, and a lot of
other tools right out of the box.

http://www.enthought.com/products/epd.php

It is currently available as a single-click installer for Windows XP
(x86), Mac OS X (a universal binary for OS X 10.4 and above), and
RedHat 3, 4, and 5 (x86 and amd64).

EPD is free for academic use.  An annual subscription and installation
support are available for individual commercial use.  Various workgroup,
departmental, and enterprise level subscription options with support and
training are also available.  Contact us for more information!

by dpeterson at December 18, 2008 09:20 AM

Titus Brown

Lazyweb query: code review and git?

The pygr project is gearing up to do some code reviews, and we're not aware of too many (any?) mature (or even adolescent) tools that interact well with git. A Google search finds Gerrit and a blog post about Code Review -- anything else we should know about?

thanks, --titus

p.s. If you e-mail me I will summarize here; or you can comment, but I might lose your comment in all the spam :(

December 18, 2008 05:03 AM

December 16, 2008

OpenOpt (Dmitrey Kroshko)

OpenOpt release 0.21

Hi all,
I'm glad to inform you about new OpenOpt release: v 0.21.

Changes since previous release 0.19 (June 15, 2008):

Backward incompatibility:
  • instead of "from scikits.openopt" now you should use "from openopt import ..."
  • LSP has been renamed to NLLSP
  • oofun with ordinary variables (x) support had been ceased (it's hard to keep maintaining), use oovars instead.
Until OpenOpt subversion repository will be finally moved to new host, you can download v 0.21 from here

Welcome to http://forum.openopt.org - new forum about numerical optimization and related free and open source software.

For assist with new host and forum
Thanks to Michailo Danilenko and Volodimir M. Lisivka from linux.org.ua (aka LOU) community.
Special thanks to Wadim V. Mashkov and Michael Shigorin from linux.kiev.ua community.

Regards, Dmitrey.

by Dmitrey (noreply@blogger.com) at December 16, 2008 09:41 AM

Matthieu Brucher

Book review: Google Analytics 2.0

I’m a very curious guy, and I wanted to know who is looking at my blog, and for my wife, who is interested by what is viewed on her decoration site (in construction as she wants to make a living of decoration advice). With my hosting service, I have access to Awstats, but Google Analytics seems better suited for data analysis. And this is what this book explains.

Content and opinions

The first part is about the basics of analytics. A short chapter presents the issues of analyzing logs to enhance one site and the next two chapters are about AWStats. This server program (it runs directly with the webserver) is presented with its advantages and its drawbacks, which naturally leads to client programs (the logs are accumulated by a script running on a client browser).
The second part is dedicated to Analytics basics. In fact, the goal is not to start analyzing data, it is extracting the data you want and also sharing it, if you need to. Basic regular expression with application to prefiltering data (during the acquizition) and filter manipulation allow a first data classification befor ethe actual analysis that is presented in the next part. Every explanation is very simple, with a lot of images, so it is easy to understand. Then you can set up the goals of your site, which leads to Adwords campaigns (that can help you earn money), and thus better graphs as you will be able to mix Adwords results with visitors statistics. Finally, you can have estimates of your gains by telling Analytics how much you earn if someone hits a specific page (with a link to Adsense, Google Adwords’ twin). So with this chapter, you are just taught how to install everything so that the fun begins, which is the content of the next parts.

I will spend less time on this part, because I think a lot can be learnt just by looking at how Analytics works. The so-called Dashbord and the time line can be customized, so some tips are useful, but these chapters do not bring much (well, for someone who loves to click everywhere to see how it works a least).

Analytics purpose is to expose some statics about visitors. It generates a lot of different graphs, each of which is presented in the fourth and the fifth parts. It’s sometimes boring, but some useful information are dissiminated everywhere.

The last parts are each very short (they could have been merged, or at least reorganized). They are mainly higher-level views of the data presented in the last two chapters. How people navigate in the site, if goals are efficient or not, and where they are the most, additional e-commerce reports, these are data you won’t use at first, because I think they are more difficult to understand and to optimize. If you can’t figure how to use raw data, I don’t think you will be able to benefit from these. For more advanced users, their presence in the book, albeit short, is valuable (if you’re an advanced user, you should be able to understand the graphs, but knowing Analytics provide them somewhere is the additional value).

Conclusion

Also some pieces of information are quiet natural or obvious, there are a lot of small pieces of advice that are very interesting and important. You can use Google Analytics without help, but it is far easier to have some insights from people that studied several solutions thoroughly. It does not help you to optimize your site, it only gives you the information you need to think of the optimizations you can make.

Google Analytics 2.0 (Paperback)
by Jerri L. Ledford, Mary E. Tyler
ISBN: 047017501X

Price: USD 19.79
55 used & new available from USD 8.26

| 3.5 | 11

by Matt at December 16, 2008 08:11 AM

OpenOpt (Dmitrey Kroshko)

OpenOpt (and my) Future

My interim in icyb.kiev.ua optimization dept is up to be finished.

Today I have spoken with my chiefs from about my further career.
They have informed me: because of financial crisis the situation is undefined.

Still I have some chances. Moreover, there are some chances I will be permitted to spent some work time for further openopt development.
However, it is provided some conditions to be satisfied.

Some days ago I had been forced to move my code out of scikits framework, my chiefs want me to host it inside Ukraine under our control (mb you know I had been refused to obtain veto rule for openopt code changes within scikits framework; I understand scipy community - they can't take taht something is hosted inside their server and is out of control, but I understand my chiefs position as well).

I have contacted uafoss.org (FOSS Ukraine dept), and they gladly allowed me to host it here.
Full transportation (svn repository, doc, wiki etc) will be finished within several days. But numerical optimization forum already works, you are welcome: http://forum.openopt.org. I had noticed openopt.com and openopt.net are already taken by someone, so it's definitely correct time to go.

Thanks to Michailo Danilenko and Volodimir M. Lisivka from linux.org.ua (LOU) community.
Special thanks to Wadim V. Mashkov and Michael Shigorin from linux.kiev.ua community.

by Dmitrey (noreply@blogger.com) at December 16, 2008 01:41 AM

December 15, 2008

OpenOpt (Dmitrey Kroshko)

Some changes for oofun

Some changes for oofun:
  • oolin now can handle matrices: oolin(C) creates oofun that returns numpy.dot(C, inp_array); oolin(C, d) yields oofun that returns numpy.dot(C, x) + d. See updated oolin example.
  • add fixed variables handling for oovars. Now you can declare v.fixed = True or just v = oovar(..., fixed = True). So all oofuns that recursively depend on fixed oovars only will be calculated only once, and derivatives will be all-zero (hence no calculations will be required for those parts of code that depend on fixed oofuns only). In future it will be good to have inner fixed coords for oovars, like this: v.fixed = [0, 1, 15] (i.e. positions of fixed coords inside the oovar). See the fixed oovars example.
However, let me remember you that currently oolin(anything) is considered as non-linear code, i.e. it goes to non-linear constraints and objective function only. In future it should be implemented oolin(some_oovars) -> general linear constraints (Ax = b, Aeq x = beq). As for lb = x = ub, they are handled correctly from oovar fields.

by Dmitrey (noreply@blogger.com) at December 15, 2008 06:03 AM

December 10, 2008

Gaël Varoquaux

What’s new in Mayavi 3.1.0?

Mayavi 3.1.0 has just been released, and I think it is a fantastic version. We are starting to be able to focus on the details and the focus. In addition, we are getting user feedback, which helps identify the pain points.

Automatic scripting

This is a huge deal! You now have a record button on the pipeline view. In record mode, the modifications that you make to the objects properties are recorded as valid Python lines: Mayavi tells you what are the line of code to modify those properties or create new objects. I use this a lot: I first build a skeletton of a visualization using mlab but when it comes to tuning parameters, I do it interactively, and record.

Much more testing

We added a huge amount of testing (many thanks to Suyog who contribed quite few). From an user’s point of view this has two consequences. First the code is more robust (for instance the mlab commands are more flexible on the shape of the arguments passed in). Second the rendering part of the Mayavi engine is well-separated from the algorithmes, which means that the VTK algorithms can now be used easily to manipulate numpy arrays through Mayavi.

Two new mlab functions: barchart and triangular_mesh

Mlab has two new functions: one to create nice bar chart, for 2D histograms displayed in 3D, and one to build meshes defined from their triangle.

Control of the pipeline through mlab is easier and more robust

As the mlab.pipeline is getting more usage, it is being ironed out. For instance applying a module to a source object (may it be a Mayavi source, or a vtk dataset) adds it automatically to the figure if it is not already in it. Also, when adding an additional module on an existing source, a new module manager (object controlling the colormap) is added automatically if the colormaps or extents differ. Many modules take keyword arguments to make common operations easier.

IPython in Mayavi

If you have a recent version of IPython installed (> 0.9), Mayavi will use an IPython widget, instead of the vanilla pyshell.

mlab.view has now a sensible behavior

The mlab.view no longer gives a bad roll angle to the camera. This makes it much easier to do animations during which the camera moves.

Axes and outline extents

mlab.axes and mlab.outline now adjust by default to the extents of the object they are applied on. This removes a bad surprise for people having tuned the scale of their visualization.

enthought.tvtk.tools.visual in Mayavi

enthought.tvtk.tools.visual can now be used inside Mayavi, to provide a visual-like API in mayavi.

Documentation has recieved some love

Documentation has been added and completed, with a focus on making it easier for the beginner to discover the features of Mayavi. We try more and more to walk the user through complete usecases of Mayavi, in a task-oriented documentation, such as in the introductory examples, or in case-studies.

Two new sources

There are two new sources that do not require data. The first creates objects, such as an arrow, a cube, or a view of the earth, to be viewed with a ’surface’ module. The second creates image data, such as a disk, or a 2d gaussian, or (my favorite) the Mandelbrot set. This can be viewed with an ImageActor, or (even better) with a WarpScalar filter and a Surface. These sources have been contributed by Suyog.

A word of thanks

I am sure I am going to forget some people here, but I’d like to thank a lot those who have been helping us with getting Mayavi2 going. First of all, Dave Peterson, who is doing the release management for ETS. This is a lot of work, and we would never have frequent releases without him. I’d also like to thank Suyog Jain, who contributed some code. This is fantastic, and I am sure we are going to have more people contributing improvements. Finally, I’d like to thank Pierre Raybault, of Python(x,y), and Varun Hiremath, of Debian. Packaging is very important to our users, and it is not a trivial piece of work… Hum, I almost forgot Chris Casey. Chris has been updating the docs on the net and making sure the docs build well. This is also very important, as the web page is a major means of communication with our users.

by gael at December 10, 2008 11:56 PM

December 09, 2008

Matthieu Brucher

Using SCons with Eclipse (Linux)

I chose Eclipse as my new Linux IDE, instead of Konqueror + KWrite. The purpose was to be able to launch a SCons build from the IDE, get the errors in a panel and double-clicking on one of them would direct me to the location of the error.

So Eclipse seemed to fit my needs:

  • Plug-ins to add the support of various languages
  • Support of different construction tools
  • Support from the main C/C++/Fortran compiler developers (GNU, Intel, IBM, …)

So I will know show you two ways of enabling SCons support for Eclipse.

The first method is straightforward. Just create a new Program Builder and use SCons as the command. You can also add different paramters, if needed.

Using an external program to build Eclipse projects

Using an external program to build Eclipse projects

External program configuration panel

External program configuration panel


This method is usable with any Eclipse installation. The problem is that the errors are not reported.

The second solution is based on the CDT plugin. CDT offers several error parsers, and you can develop your own (for instance for Latex). You just need to replace make by scons. You can do this by telling Eclipse not to build its own makefile, but that you will handle everything. In the Behavior panel, you can also change the default arguments.

Using an alternate builder with CDT

Using an alternate builder with CDT

Selecting error parsers for SCons

Selecting error parsers for SCons


Then, you can pick the error parsers you need, depending on your compilers.

I’ve also tested this with Windows, but it failed: the Microsoft compilers were not recognized, although the compilation from a simple command line worked.

by Matt at December 09, 2008 08:18 AM

Gaël Varoquaux

George Orwell and Kafka… about our Justice and Law Enforcement

We used to park our car in a calm neighborhood a few blocks away from our place, as there where always empty spots.

In January of this year, our car was picked up by the police because we had left it parked for more than a week (actually more than two). I was amazed to find out that in France, you are not allowed to leave your car parked in the same place on city land for more than a week. Like most of our neighbors, we have no parking spot, but unlike them, we use our car very seldom. The police told us that cars where picked up when neighbors called to complain. “Null n’est censé ignorer la loi” (none is supposed to ignore the law), but this law had been passed after I got my driving permit. Anyhow, we payed the stupid fine and our car ended up at my parents place, because moving it every week was too much of a hassle.

Tonight I was visiting my parents, to take care of our car that has been staying at their place for the last 6 months, and I see in the mail an order from the tribunal (”Avis de poursuite par huissier de justice”) claiming I had not payed money I owed to the state, and if I did not pay quickly my furniture and my car could be taken from me, and my bank accounts blocked. After a while I realized it must be about the fine. Apparently we had already recieved a letter while I was in the states, to which Emmanuelle had already answered, claiming that we had not payed the fine. This is actually a fairly bold claim, as to retrieve the car, we had no choice but to pay the fine.

Now the letter is threatening me as badly as possible. It gives me very little information about what and why it is about, nor whom I can contact to discuss this. It is actually sent not by the government, but by someone the government contracts to retrieve the money. It is fairly hard for me to know what my rights are (I did google about this, but google is not always terribly explicit about legal matters). I believe there has been an error, and that I am wrongly accused of not having payed. But I cannot check any thing, because not precise claim is being made. Googling does tell me that I am not the only in this situation, and that most often people simply give up and pay.

Our law enforcement is something to be proud of. They resort to fear mongering techniques better known to be used by the mafia. They make really sure that you cannot know what you are accused of, and add layers between you and themselves to protect themselves. On top of that, they are probably simply incompetent, and make a significant amount of mistakes, one of which I suffer from.

When I was in the states, I criticized this great country for its army of pitiless layers. I don’t feel my motherland is doing much better right now. The most worrying thing is not that I have to pay this stupid fine; it is that I am more and more loosing faith in our law enforcement and our justice. When the average citizen considers a cop as a despicable and shameful creature, you are not headed the right way.

By the way, I am indeed tagging this post “scientific computing” for the sole purpose to get it on planet scipy, because I want as many people as possible to know my love for the system :).

by gael at December 09, 2008 12:34 AM

120 pages!

The mayavi manual in SVN has now 120 pages when compiled to pdf. I know that this is a stupid metric, and that the quality is more important then the number of pages, but it does give me a warm and fuzzy feeling.

More seriously, next release of Mayavi (coming soon, we promise) is going to have a lot of added documentation for the casual users. In particular the mlab section has been expended a lot and is starting to hint at Mayavi’s full power.

Thanks to Chris Casey, who is making sure that the docs land on the net as soon as they are written.

by gael at December 09, 2008 12:00 AM

December 08, 2008

Enthought

Misc complaints about wxPython documentation

I really appreciate the work done on wxWindows and wxPython to provide a cross platform UI API, but there are some things that really irk me about working with WxPython. While hunting down the problem in my previous blog post about the artifacts in the Cairo rendered image background, I found the problem was with the way in which I was blitting. Was I doing it wrong? I don’t know. The API docs sometimes omitted the methods I was trying to use, depending on which version I was looking at. I tried to find the documentation for the specific version I have, but was unable to find docs for that version because the wxpython site has the habit of hiding or removing docs every time they release a new version. On top of that problem, when I try to get help on a python module or method from within a python interpreter, I get an exception. Even when I am willing to assume the docs on the web site are the same for the version I have, I find the API docs lacking any kind of detail. The wxWindows API docs are detailed and usually helpful, but the wxPython API docs lack any detail. Compare these two API docs for the Bitmap class:

wxPython

wxWindows

If you are going to force me to read the wxPython docs to get the arguments right, and the wxWindows docs to understand what the arguments mean, at least provide me a link…

by bryce hendrix at December 08, 2008 05:40 PM

December 06, 2008

OpenOpt (Dmitrey Kroshko)

some changes

I have committed some changes to OO Kernel, most important is a fix reducing time for connecting oovars to prob instance (the recursive function throw all oofuncs took too much time previously).

by Dmitrey (noreply@blogger.com) at December 06, 2008 01:50 AM

December 05, 2008

David Cournapeau


From ctypes to cython: a personal experience with audiolab

Since the cython presentation by R. Bradshaw at Scipy08, I wanted to give cython a shot to wrap existing C libraries. Up to now, my method of choice has been ctypes, because it is relatively simple, and can be done in python directly.

The problem with ctypes

I was not entirely satisfied with ctypes, in particular because it is sometimes difficult to control some platform dependant details, like type size and so on; ctypes has of course the notion of platform-independant type with a given size (int32_t, etc…), but some libraries define their own type, with underlying implementation depending on the platform. Also, making sure the function declarations match the real ones is awckward; ctypes’  uthor Thomas Heller developed a code generator to generate those declarations from headers, but they are dependent on the header you are using; some libraries unfortunately have platform-dependant headers, so in  heory you should generate the declarations at installation, but this is awckward because the code generator uses gccxml, which is not widely available.

Here comes cython

One of the advantage of Cython for low leve C wrapping is that cython declarations need not be exact: in theory, you can’t pass an invalid pointer for example, because even if the cython declaration is wrong, the C  compiler will complain on the C file generated by cython. Since the generated C file uses the actual header file, you are also pretty sure to avoid any mismatch between declarations and usage; at worse, the failure will happen at compilation time.

Unfortunately, cython does not have a code generator like ctypes. For a long time, I wanted to add sound output capabilities to audiolab, in particular for mac os X and ALSA (linux). Unfortunately, those API are fairly  low levels. For example, here is an extract of AudioHardware (the HAL of CoreAudio) usage:


AudioHardwareGetProperty(kAudioHardwarePropertyDefaultOutputDevice,
             &count, (void *) &(audio_data.device))

AudioDeviceGetProperty(audio_data.device, 0, false,
                       kAudioDevicePropertyBufferSize,
               &count, &buffer_size)

Mac OS X conventions is that variables starting with k are enums, defined like:


kAudioDevicePropertyDeviceName = 'name',
kAudioDevicePropertyDeviceNameCFString = kAudioObjectPropertyName, kAudioDevicePropertyDeviceManufacturer = 'makr',
kAudioDevicePropertyDeviceManufacturerCFString = kAudioObjectPropertyManufacturer,
kAudioDevicePropertyRegisterBufferList = 'rbuf',
kAudioDevicePropertyBufferSize = 'bsiz',
kAudioDevicePropertyBufferSizeRange = 'bsz#',
kAudioDevicePropertyChannelName = 'chnm',
kAudioDevicePropertyChannelNameCFString = kAudioObjectPropertyElementName,
kAudioDevicePropertyChannelCategoryName = 'ccnm',
kAudioDevicePropertyChannelNominalLineLevelNameForID = 'cnlv'
...

Using the implicit conversion char[4] to int - which is not supported by cython AFAIK. With thousand of enums defined this way, any process which is not mostly automatic will be painful.

During Scipy08 cython’s presentation, I asked whether there was any plan toward automatic generation of cython ‘headers’, and Robert fairly answered please feel free to do so. As announced a couple of days ago, I have taken the idea of ctypes code generator, and ‘ported’ it to cython; I have used on scikits.audiolab to write a basic ALSA and CoreAudio player, and used it to convert my old ctypes-based wrapper to sndfile (a C library for audio file IO). This has worked really well: the optional typing in cython makes some part of the wrapper easier to implement than in ctypes (I don’t need to check whether an int-like argument won’t overflow, for example). Kudos to cython developers !

Usage on alsa

For completness, I added a simple example on how to use xml2cython codegen with ALSA, as used in scikits.audiolab. Hopefully, it should show how it can be used for other libraries. First, I parse the headers with gccxml; I use the ctypes codegenlib helper:

h2xml /usr/include/alsa/asoundlib.h -o asoundlib.xml

Now, I use the xml2cython script to parse the xml file and generate some .pxd files. By default, the sript will pull out almost everything from the xml file, which will generate a big cython file. xml2cython has a couple of basic filters, though, so that I only pull out what I want; in the alsa case, I was mostly interested by a couple of functions, so I used the input file filter:

xml2cython.py -i input -o alsa.pxd alsa/asoundlib.h asoundlib.xml

Which will generates alsa.pxd with declarations of functions whose name matches the list in input, plus all the typedefs/structures used as arguments (they are recursively pulled out, so if one argument is a function pointer, the types in the function pointer should hopefully be pulled out as well). The exception is enums: every enums defined in the parsed tree from the xml are put out automatically in the cython file, because ‘anonymous’ enums are usually not part of function declarations in C (enums are not typed in C, so it is not so useful). This means every enum coming from standard header files will be included as well, and this is ugly - as well as making cython compilation much slower. So I used a location filter as well, which tells xml2cython to pull out only enums which are defined in some files match by the filter:

xml2cython.py -l alsa -i input -o alsa.pxd alsa/asoundlib.h asoundlib.xml

This works since every alsa header on my system is of the form /usr/include/alsa/*.h. I used something very similar on AudioHardware.h header in CoreAudio. The generated cython can be seen in scikits trunk here. Doing this kind of things by hand would have been particularly error-prone…

      

by cournape at December 05, 2008 08:50 AM

Enthought

Using Cairo to render SVG in Enable

A few months ago Bryan Cole added the start of a Cairo based backend for Kiva. I’ve been fighting with Agg for several SVG related features and thought Cairo might be a good subsitute for rendering SVG’s. For most of our rendering needs, Cairo is about 4x slower than Agg, so its not a good general replacement, but for the SVG editor it might be okay. I took the initial work done, which only rendered to images, hacked it a bit, and go it into an Enable canvas. There is still a lot of work to be done, and this is the first time I’ve looked at Cairo at all, so if anyone wants to help out, please do. Anyway, without further ado, here is the SVG lion image:

Cairo rendered lion

If anyone knows how to get rid of the background rendering artifacts, please let me know.

by bryce hendrix at December 05, 2008 01:03 AM

December 04, 2008

Titus Brown

Perl is Dying?

Some interesting posts over in Planet Perl:

While I'm a raging Python fanatic (or at least a serious Python user :) I don't think this is anything to gloat about: Perl vs Python is a relatively minor struggle compared to Good Language vs PHP, and Dynamic vs Static.

Either way there are good lessons for any language in there. And we should be sure to thank GvR, Barry Warsaw, and all the other Py3k contributors for moving forward with Py3k... Say what you will about Py3K's backwards incompatibility, but I think the lack of a strongly determined future for Perl, i.e. a clear roadmap for Perl 6, is one of the Perl community's big problems.

--titus

December 04, 2008 05:03 PM

December 03, 2008

Titus Brown

What have I been up to?

As a new prof, I've been too busy to blog much. What am I doing?

Apart from all the normal academic crud (meeting with people, answering e-mail, doing paperwork, etc.) and parenting & home ownership stuff, I've been teaching my Intro to Database-Backed Web Programming course. This has been neither a huge failure nor a huge success, but has at least been extremely educational for me (did I mention that jQuery is awesome?). That's consumed most of my time.

However, I do have a few technical things up my sleeve. Over Thanksgiving, I've been --

  • looking over a student's work on seqdb2, a Write-Once-Read-Many database for indexing and retrieving large DNA and protein sequence collections (e.g. Solexa reads). It's somewhat specifically aimed at providing a pygr back-end to help us manage the startlingy large amount of sequence data now available.

  • starting to get into SeqFinder, an n-mer indexing and search library for doing O(n) searching for n-length oligos in gigabase-sized genomes. The basic algorithm is simple and fairly straightforward, but the devil is in the details; a collaborator is implementing it, and I'm hoping to turn it into a fairly general oligo search tool for several nefarious purposes.

  • and, of perhaps the most general interest, I'm trying to grok Pyrex sufficiently to add code coverage recording and analysis. Since pygr, seqdb2, and SeqFinder all make use of Pyrex or its fork, Cython, this is critical for me for testing. I suspect it will be quite useful to the SciPy folk as well, once I get it working nicely.

    I do have something that basically works, but it's not integrated into either Pyrex or figleaf very well yet and I need to hack on it a bit further.

    My ultimate goal is to get coverage data from Python, Pyrex/Cython, and C/C++ all in one format so that I can do a proper job of looking at statement coverage for these multi-language applications. That's still a ways off, and one of the bigger barriers is that gcov doesn't work on shared libraries. Grr.

Hopefully I can spend some more time on these things now that the term is almost over!

--titus

December 03, 2008 03:03 AM

The One True Testing Approach: There's No Such Thing

The ongoing debate about doctests (here, and links therein) seems to me to be somewhat silly.

doctests should be assessed by their utility to you and your project, in whatever role you happen to be using them. I personally find them to be very useful in API documentation, where they can help show the API in use while documenting it in a verifiably correct way. I've also found them to be useful in teaching, where both you and your students can rely on your code examples to be correct, or at least to run properly.

Obviously you still have to write good API documentation, and good examples for teaching; simply having a doctest isn't a guarantee of quality docs, just like simply having tests doesn't guarantee that your code is tested. I personally find functional tests to "fit my brain" much better for actual testing, and unit tests are often the second approach I use. But I like to write doctests, too, and I think they have their place.

So why is everyone talking about whether or not doctests are good? I think too many people are after the One True Testing Approach, an approach that they can use, without thought and to the exclusion of all others, for their testing.

For example, Andrew (?) complains that doctests are narrative, and unit tests are less good when they're narrative. OK, so you think people are misusing doctests... but maybe they have a place in narrative tests, like API documentation or functional tests?

Before I mischaracterize Andrew's position too much, I should say that he thinks unit tests should make up the bulk of automated tests. While I don't fundamentally disagree, I do think that's an extreme position that depends on what it is, exactly, that you're writing. Libraries are going to have different needs than database-intensive apps, which in turn will be different from AJAX-heavy Web GUIs, which will be different from scientific apps.

In Andrew's second post, he complains that doctests have a number of drawbacks. Hey, I agree with most of what he says -- but I think he's wrong, again, in phrasing doctests as an alternative to xUnit-style unit tests. I think they're complementary.

Ned Batchelder chimes in with more doctest problems, which basically reiterate Andrew's complaints that doctests don't work as your only (or even your primary) set of tests. Again, I agree completely. So, umm, why not use doctests where they're appropriate, like in API example documentation, and use unit tests for the rest? Why should I choose to use only one tool?

Martijn wrote a nice post pointing out some of the good things about doctests, and my experience with doctests echoes his, frankly. I think they're great for keeping basic API information up-to-date, and I really like having executable documentation. They also force you to think in certain ways about your APIs, which can make the APIs better.

In sum, I'm saying that you should pick the kind of testing approach that gives you the most bang for your buck (where "buck" is measured in time, or money, or whatever). That means that the testing approach of choice is going to be context dependent, and that context includes things like the project itself, the team, the language(s) being used, and your ultimate goals. This may seem like useless advice, but I think it's at the heart of productive testing: using the most effective tools for the job.

A corollary to that view is that there is no One True Testing Approach; there are just a lot of complementary approaches. Figuring out which is good for what purpose is part of the learning process!

I'd rather see the conversation shift to what doctests are good for, and in that vein I encourage people to read all of the comments on Martijn's post.

--titus

p.s. If you comment, please drop me a note at t@idyll.org.

December 03, 2008 03:03 AM

PyCon review process

We're going through the PyCon '09 review process, and participating in the process has been pretty interesting. (I joined the Program Committee in large part because I was told to put up or shut up after I critiqued PyCon '08. Ahh, the open source world... where you're encouraged to go fix things when you complain :). In particular, this is the first review process I've seen where regular communication between the reviewers and authors is expected, and proposals are modified in response to reviewer comments.

There are a couple of drawbacks to this process. One is that there's no clear boundary between reviewer opinion and expectation. It's one thing to say "I don't understand X, Y, or Z in your proposal; could you clarify, please?" and another to say "I don't agree with X, Y, or Z, and I won't push your proposal unless you change your views." The former seems pretty legit, but the latter strikes me as being counter to the conference ethos of encouraging diversity in views. While I don't think anyone has been that explicit, there have been extensive conversations between reviewers and authors that have had much the same effect...

Another drawback of this system is that authors can express their frustration at reviewer comments pretty directly. Sometimes this frustration is legit, but other times it's hysterically off-base; you don't make any friends when you tell your reviewers that they're idiots (just as one purely hypothetical example...)

On the flip side, I think several proposals have been dramatically improved through reviewer feedback. I don't know how well this kind of insta-review process might work for academic journals -- I believe PLoS One is trying it out? -- and I'll be watching some of the early experiments with interest.

Paranthetically, let me add that we have a bunch of great proposals, and Ivan Krstic is doing a fantastic job of running things! So I expect PyCon '09 to be a very good conference.

--titus

December 03, 2008 03:03 AM

December 02, 2008

Enthought

Little known ETS features: Progress Dialog

ETS contains a lot of nice features which don’t get any press, many of which aren’t event well known in the Enthought office. Today I thought I’d show off the ProgressDialog class I wrote more than a year ago.

The ProgressDialog class is intended to have an API similar to QT and Java’s progress dialogs has many of the same features:

  •  Optionally show estimated time remaining
  • Supports indeterminate progress where the total number of steps is not known
  • Cancel and skip buttons
  • Automatically closes for determinate instances

This is all the code needed to do a dialog while doing a simple loop (the entire example can be found here):

def task_func(t):
    progress = ProgressDialog(title="progress", message="counting to %d"%t,
                              max=t, show_time=True, can_cancel=True)
    progress.open()

    for i in range(0,t+1):
        time.sleep(1)
        print i
        (cont, skip) = progress.update(i)
        if not cont or skip:
            break

    progress.update(t)

Here’s what the progress dialog looks like using the WX backend (yes, there is a QT4 backend too)

ProgressDialog Screenshot

by bryce hendrix at December 02, 2008 01:18 AM

December 01, 2008

David Cournapeau

cython-codegen: cython code generator from gccxml files


I have enjoyed using cython to wrap from C libraries recently. Unfortunately, some libraries I was interested in (Alsa, CoreAudio) are quite big. In particular, they have a lot of structures, typedefs and enumerations which are easy to get wrong by doing it manually. Since the problem is quite similar to wrapping with ctypes (my former method of choice), I thought it would be interesting to do something like ctypeslib code generator for cython - hencecython-codegen “project”, available on github:

http://github.com/cournape/cython-codegen

Basic usage goes like this to generate a .pyx file for the foo.h header:

gccxml -I. foo.h -o foo.xml
xml2cython.py -l 'foo' foo.h foo.xml

I can’t stress enough that this is little more than a throw-away script, and is likely to fail on many header files, or generate invalid cython code. I could use it successfully on non trivial headers though, like alsa or CoreAudio on Mac OS X. Your mileage may vary.

      

by cournape at December 01, 2008 10:45 AM

November 27, 2008

OpenOpt (Dmitrey Kroshko)

Ironclad v0.7 released (NumPy on IronPython)

IronClad developers have announced release v 0.7.
I guess it makes possible to use OO and some solvers (like ralg) from IronPython.

by Dmitrey (noreply@blogger.com) at November 27, 2008 10:44 AM

November 25, 2008

Matthieu Brucher

Book review: Fortran 90/95 for Scientists and Engineers

When I started my new job three months ago, I didn’t know how to write a Fortran program. I had to modify an already existign Fortran 77 program to enhance and parallelize it. So I went to the library and I took this book aimed at people like me.

Content and opinions

There is no clear delimitation between the 16 chapters, this is the only glitch in the table of contents. Also, some chapters talk about a subject, then another is introduced, and the one after deals about enhancements of the first subject. But all in one, the difficulty is growing through the book (that’s obviously the reason of no clear delimitations ebtween subjects).

If you just need basic Fortran, then the first 8 chapters will be enough. If you need more advanced techniques, you can directly go to the last chapters. Obsolete and bad Fortran pratices are grouped in the last chapter, and it’s a good thing to have a small place where you can refer to if you encounter some obsolete Fortran features.

There are a lot of examples, with explanations, notes, Fortran good practices, … Each chapter ends with a summary, with the main ideas (once you’ve read the chapter, they are the only things that matter) and the associated good practices. It is to be noted that almost every Fortran practice is explained, the good as the bad ones, the one you should use as the one you should not, but you can still encounter them in old code (or Fortran 77 code). But the book is clear about which you must use. The end of each chapter is a set of “cards” with the new tools (statements, functions, …).

Each time I encountered a problem, I found something in this book. And it is usually in the summaries (because you do not need every explanation if you’re only looking for a specific answer).

Conclusion

The book was really easy to read. It seems to cover almost every part of the Fortran standard, even those which should not be used anymore. I didn’t see a warning about the danger of passing twice the same array to a function that will modify it, but it is the only thing I found missing.

Fortran 90/95 for Scientists and Engineers (Paperback)
by Stephen Chapman
ISBN: 0071232338

Price:
4 used & new available from USD 68.50

| 4 | 16

by Matt at November 25, 2008 08:08 AM

OpenOpt (Dmitrey Kroshko)

CorePy: Assembly Programming in Python

I've got to know about BSD-licensed v 1.0 release of CorePy - "a Python package for developing assembly-level applications on x86, Cell BE and PowerPC processors".

I guess it would be useful for those objective or non-linear functions that are required to be evaluated sufficiently faster than pure Python-coded.

Of course, using C, C++, Fortran code via Cython, f2py, ctypes, SWIG, Pyrex etc could yield some speedup as well.

by Dmitrey (noreply@blogger.com) at November 25, 2008 12:17 AM

November 22, 2008

Gaël Varoquaux

Using Mayavi to explore a potential field

As promised, here is the sequel to the tutorial I posted yesterday on using Mayavi with scipy to understand the trajectories of a particle in a potential. (chances are you are reading this before my previous post. I suggest you first jump to my previous post, and then come back here).

This tutorial shows you how to use the powerfull VTK and Mayavi feature to explore the trajectories in the same potential. However, the tools we are using do not given us as much control on the dynamics of the system, so this time we do not add damping or oscillation of the potential. At the end of the day, the resulting visualization is however much more interactive. Once again, I would like as much feedback as possible, as this is intended for the Mayavi User Guide.

___________

In this example, we create a vector field from the gradient of a scalar field and explore it interactively. This example shows you how to do some operations similar to the previous example, but interactively, using the filters and module. This approach requires a better knowledge of Mayavi and the VTK filters, but the big gain is that the resulting visualization can be explored interactively.

First, let us create the same scalar field as the previous example:. We open Mayavi and enter the following code in the Python shell:

from enthought.mayavi import mlab
import numpy as np

def V(x, y, z):
    “”" A 3D sinusoidal lattice with a parabolic confinement. “”"
    return np.cos(10*x) + np.cos(10*y) + np.cos(10*z) + 2*(x**2 + y**2 + z**2)
X, Y, Z = np.mgrid[-2:2:100j, -2:2:100j, -2:2:100j]
mlab.contour3d(X, Y, Z, V)

As in the previous example, we can change the color map and the values chosen in the isosurfaces.

We want to take the gradient of the scalar field, to create a vector field. To do this we are going to use the CellDerivatives filter, that takes derivatives of the data located in the cells (that is, between the points, see Creating data for Mayavi). For this, we first need to interpolate the data from the points where it is located to the cells, using a PointToCellData filter. We can then apply our CellDerivatives filter, and then a CellToPointData filter to get point data back. (remark: if you are not using the latest Mayavi from SVN - 3.1.0 - you need to enable the ‘pass data’ option in the two CellToPointData and PointToCellData filters).

To visualize the vector field, we can use a VectorCutPlane module. The resulting vectors are too large, and we can go to the Glyph tab, (and the Glyph tab in this tab), to reduce the scale factor to 0.2. The vector field is still too dense, therefore we go to the Masking tab to enable masking, mask with an on ratio of 6 (one arrow out of 6 is masked) and turn off the random mode.

To have nice colors, we also changed the color map of the vector field by going to the Colors and legend node just above the VectorCutPlane, and choosing a look up table in the VectorLUT tab, as there can be different color maps for vector data and scalar data.

Unlike the previous example, we can play with all the parameters in the dialog box, like masking, or select color_by_scalar in the Glyph tab, to display the value of the potential. We can also move the cut plane used to display the vectors by dragging it.

Now that we have a 3D vector field, we can also use Mayavi to integrate the trajectory of a particle in it. For this we can use the streamline module. It displays trajectories starting from the vertices of a seed surface. We choose (in the Seed tab) a Point Widget as a seed. We can then move the seed point by dragging it along in the 3D scene. This allows us to explore the trajectories in the potential created by the initial scalar field. In our case, all the trajectories end up in a local potential minimum, and moving the seed point along lets us see in which minimum each point will fall into, in other world the basin of attraction of each local minimum.

by gael at November 22, 2008 02:22 PM

November 21, 2008

Gaël Varoquaux

Using Mayavi with Scipy: a tutorial

Many years ago, I was working with a bright undergrad on the trajectories of a atoms in a complex light field created by the intersection of two laser beams. She had developped a code in C, and I was starting to discover Python, so we had binded in t in Python. We where using the Python binding, together with ipython and matplotlib to explore and debug the code. However, our problem was readlly fundementally 3D, and I din’t find the status of the 3D plotting tools in Python satisfying.

That usecase was very much on my mind while working on Mayavi, as I have always believed that Mayavi and ipython could make a fantastic steering and debugging tool for 3D Physics code. I think Mayavi is starting to be pretty mature for this set of problems and as I am improvong the docs, I decided to write a tutorial example on this specific problem. I am posting it here as a preview. This is going to go in the docs, so please, if you have any comments that might improve it, fire away.

————–

This tutorial example shows you how how you can use Mayavi interactively to visualize numpy arrays while doing numerical work with scipy. It assumes that you are familiar with numerical Python tools, and shows you how to use Mayavi in combination with these tools.

Let us study the trajectories of a particle in a potential. This is a very common problem in physics and engineering, and visualization of the potential and the trajectories is key to developing an understanding of the problem.

The potential we are interested is a periodic lattice, immersed in a parabolic confinement. We will shake this potential and see how the particle jumps from a hole of the lattice to another. The parabolic confinement is there to limit the excursions of the particle:

import numpy as np

def V(x, y, z):
    “”" A 3D sinusoidal lattice with a parabolic confinement. “”"
    return np.cos(10*x) + np.cos(10*y) + np.cos(10*z) + 2*(x**2 + y**2 + z**2)

Now that we have defined the potential, we would like to see what it looks like in 3D. To do this we can create a 3D grid of points, and sample it on these points:

X, Y, Z = np.mgrid[-2:2:100j, -2:2:100j, -2:2:100j]
V(X, Y, Z)

We are going to use the mlab module (see Simple Scripting with mlab) to interactively visualize this volumetric data. For this it is best to type the commands in an interactive Python shell, either using the built-in shell of the Mayavi2 application, on in ipython -wthread. Let us visualize the 3D isosurfaces of the potential:

from enthought.mayavi import mlab
mlab.contour3d(X, Y, Z, V)

We can interact with the visualization created by the above command by rotating the view, but to get a good understanding of the structure of the potential, it is useful to vary the iso-surfaces. We can do this by double-clicking on the IsoSurface in the Mayavi pipeline tree (if you are running from ipython, you need to click on the Mayavi icon on the scene to pop up the pipeline). This opens a dialog which lets us select the values of the contours used. A good view of the potential can be achieved by turning off auto contours and choosing -0.5 as a first contour value (eg by entering it in the text box on the right, and pressing tab). A second contour can be added by clicking on the blue arrow and selecting “Add after”. Using a value of 15 gives a nice result.

We can now click on the Colors and legends on the pipeline and change the colors used, by selecting a different LUT (Look Up Table). Let us select ‘Paired’ as it separates well levels.

To get a better view of the potential, we would like to display more contours, but the problem with this approach is that closed contours hide their interior. On solution is to use a cut plane. Right-click on the IsoSurface node and add a ScalarCutPlane through the “Add module” sub menu. You can move the cut plane by clicking on it and dragging.

To make the link between our numpy arrays and the visualization, we can use the same menu to add a Axes and an Outline. Finally, let us add a colorbar. We can do this by typing:

mlab.colorbar(title=‘Potential’, orientation=‘vertical’)

Or using the options in the LUT dialog visited earlier.