Speed up plot rendering in Python/Matplotlib

July 7, 2011 — 5 Comments

Lately I’ve been working with a dataset where I need to plot around 50,000 dense points, and the render time has been horrible. We’ve all downloaded electronic documents with plots like this. These files become almost unusable because (a) the file size is huge and (b) they hiccup whenever you scroll past one of the offending graphs. I don’t want my thesis to annoy people who use it!

One (lame) solution is to just save the entire figure as an image (png/jpg), but I don’t like this option because I want axes and annotations to be vector-quality. I’ll choose the huge file size over pixelated text.

Yesterday, though, I discovered a little hack in Python’s excellent Matplotlib1 graphing library that really hit the spot. StackOverflow to the rescue!2

In Matplotlib, you can assign an attribute to any plot element called rasterized=True. It will convert only that graphical element to raster, while retaining everything else in the plot as vector. You also have fine control over the resolution (dpi) of the rasterized components.

Here is an example. I am plotting the previously mentioned 50,000 point dataset as well as a sine function for comparison. I want to rasterize the xy-data but keep everything else (axes, gridlines, legend, the sine function) vector. You can run this example by downloading rasterize.py and xydata.txt below.3 Here is the output:

im1

Closer inspection confirms that the blue dataset is raster, while the legend, red curve, and gridlines are vector:

im2

You can really get a feel for the impact this makes on usability by comparing the pdf with rastering to the pdf without rastering. Try zooming in and out in each of these files—the difference is profound!

There are obvious trade-offs here. You are sacrificing the vector quality of the dense dataset for the sake of improved performance, but that’s a compromise I’m happy to make in this case.

Data scientists in the audience: Is there a better solution to this problem?

5 responses to Speed up plot rendering in Python/Matplotlib

  1. Excellent post. I have run into this problem before when using Matplotlib to generate plots for publications. When you get to the plot in a PDF, loading goes really slow. Thanks for pointing out the solution.

  2. William C Grisaitis March 28, 2012 at 10:01 pm

    Wish I’d seen this earlier, before matplotlib saved my plots of 10^6~7 data points to a 500mb PDF file. Ha!

  3. Thanks much!!!!!!!!!!!!! This is just great!!!!!!

  4. Thanks, that was very useful!

Trackbacks and Pingbacks:

  1. links for 2011-07-10 « Stand on the shoulders of giants - July 11, 2011

    [...] Speed up plot rendering in Python/Matplotlib (tags: matplotlib svg stackoverflow via:zite) [...]

Leave a Reply

*

Text formatting is available via select HTML. <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>