Page 1 of 1

Django Python Plotting: Speed Optimization

Posted: Fri Dec 12, 2014 7:32 pm
by snoopy
Hey, I'm working on a project to make a web-based aquarium controller. Right now, I'm working on a app that controls the output of scheduled outputs - think dimming control on multiple light channels, speed control on water circulation pumps.

The design of the models is like this:

The hardware is associated with channels. Each channel represents control of a particular hardware device (say, a particular LED light)

The channels can be grouped into sources. A source is a convenience for people - think of a source as something like the sun, or the moon - each of which is associated with a group of channels. The channels have a many to many relationship to sources.

The scheduling is driven by profiles. Profiles describe different shapes over time - right now I have a square pulse, a sine wave, a "^" shape, a positive linear slope, and a negative linear slope defined.

Channels, sources, and profiles are all glued together with a ChanProfSrc object that has many to one relationships with the three, including a scaling factor for the association.

Here's the models.py:

Code: Select all

from django.db import models
from django.utils import timezone
from math import sin,pi,ceil
import Adafruit_BBIO.PWM as PWMctl
from time import sleep
from datetime import timedelta

# Set up the options for some fields
hwChoices = (
    (0, 'GPIO Out'),
    (1, 'OneWire In'),
    (2, 'PWM Out'),
)

shapeChoices = (
    (0, 'Constant'),
    (1, 'Positive Linear'),
    (2, 'Negative Linear'),
    (3, 'Sine'),
    (4, 'Square'),
    (5, '^ Shape'),
)


# Create your models here.

class Source(models.Model):
    name = models.CharField(max_length=20)

    def __unicode__(self):
        return self.name

    def __str__(self):
        return self.name

    def calc(self, calctime=[timezone.now()], prof=0):
        # Pre-fetch all of the data by channel.
        data = []
        name = []
        color = []

        now = timezone.now()

        for c in self.channel_set.all():
            name.append(c.name)
            color.append(c.traceColor)

            #Pre-seed a list for the data.
            if not prof:
                objset = c.chanprofsrc_set.filter(source__id=self.pk)

            else:
                objset = c.chanprofsrc_set.filter(source__id=self.pk,
                                                  profile__id=prof)

            predata = []
            for cps in objset:
                predata.append(
                    [i*cps.scale for i in cps.profile.intensity(calctime)]
                )

            #Now, multiply all of the contributions are make the tuple.
            for i, v in enumerate(calctime):
                t = (v - now).total_seconds() / 3600
                rundata = 1

                #If predata is empty (there are no profiles), 0 rundata.
                if not predata:
                    rundata = 0

                for r in predata:
                    rundata *= r[i]

                #If this is not the first time, add another element.
                if i:
                    tupdata += ((t, rundata), )

                #If the first time, create the tuple.
                else:
                    tupdata = ((t, rundata), )

            #Add this channel's data to the data list.
            data.append(tupdata)


        #If this didn't generate data, return blank data.
        if not data:
            return {'name':['Blank'], 'data':[((0, 0), )], 'color':['ffffff']}

        #Return the generated data.
        return {'name':name, 'data':data, 'color':color}


class Profile(models.Model):
    name = models.CharField(max_length=20)
    start = models.DateTimeField()
    stop = models.DateTimeField()
    refresh = models.FloatField(default=24)
    #Note: refresh is the amount of time to add in hours.
    shape = models.IntegerField(default=0, choices=shapeChoices)

    def __unicode__(self):
        return self.name

    def __str__(self):
        return self.name

    def intensity(self, calctime=[timezone.now()]):
        start = self.start
        stop = self.stop
        shape = self.shape
        r = ()

        # Shape = 5 is a V shape
        if shape is 5:
            slope = 2/((stop-start).total_seconds())
            shift = 1/slope

            for c in calctime:
                r += ((1 - abs(slope * ((c - start).total_seconds() - shift)))
                    * int(c > start and c < stop), )

        # Shape = 4 is a square wave
        elif shape is 4:
            for c in calctime:
                r += (int(c > start and c < stop), )

        # Shape = 3 is a sine curve
        elif shape is 3:
            # Note: Do the max so we avoid returning negative values
            for c in calctime:
                r += (max(sin((c - start).total_seconds() *
                         pi / (stop - start).total_seconds()),
                    0)*int(c > start and c < stop), )

        # Shape = 2 is a negative linear slope
        elif shape is 2:
            # Calculate the slope & intercept
            slope = -1/((stop-start).total_seconds())
            intercept = 1  # x = 0 is at self.start

            # Zero only if after the stop; if before make one.
            for c in calctime:
                if c < start:
                    r += (1, )

                else:
                    r += ((slope * (c - start).total_seconds() + intercept)
                        * int(c < stop), )

        # Shape = 1 is a positive linear slope
        elif shape is 1:
            # Calculate the slope & intercept
            slope = 1/((stop-start).total_seconds())
            intercept = 0  # x = 0 is at self.start

            # Zero only if before the start
            for c in calctime:
                if c < start:
                    r += (0, )

                elif c > stop:
                    r += (1, )

                else:
                    r += ((slope * (c - start).total_seconds() + intercept), )

        # If all else fails, treat it like a constant, ignoring time
        else:
            for c in calctime:
                r += (1, )

        return r


    def calc(self, calctime=[timezone.now()], p=0):
        # Pre-fetch all of the data by channel.
        name = [self.name]
        color = ['ffffff']

        now = timezone.now()

        #We basically just have to format the data.
        predata = self.intensity(calctime)

        for i, v in enumerate(calctime):
            t = (v - now).total_seconds() / 3600

            if i:
                data += ((t, predata[i]), )

            else:
                data = ((t, predata[i]), )

        data = [data]

        #Return the generated data.
        return {'name':name, 'data':data, 'color':color}


    def cleanup(self):
        now = timezone.now()

        #If the schedule is still running, do nothing.
        if now < self.stop:
            return

        #If refresh is set to 0, delete the profile.
        if not self.refresh:
            for cpr in self.chanprofsrc_set.all():
                cpr.delete()

            self.delete()

            return

        #If the schedule has ended, add the refresh time until it's active again.
        addAmount = timedelta(hours=self.refresh*ceil(
            (now - self.stop).total_seconds()/(3600*self.refresh)))

        self.start = self.start + addAmount
        self.stop = self.stop + addAmount
        self.save()

        return



class Channel(models.Model):
    name = models.CharField(max_length=20)
    hwid = models.CharField(max_length=10)
    hwtype = models.IntegerField(default=2, choices=hwChoices)
    pwm = models.FloatField(default=500)
    source= models.ManyToManyField(Source)
    maxIntensity = models.FloatField(default=1)
    traceColor = models.CharField(default='ffffff', max_length=7)

    def __unicode__(self):
        return self.name

    def __str__(self):
        return self.name

    def start(self):
        if self.hwtype is 2:
            # Start the PWM
            PWMctl.start(self.hwid, 0, self.pwm)
            sleep(1)

        return

    def stop(self):
        if self.hwtype is 2:
            # Stop the PWM
            PWMctl.stop(self.hwid)
            sleep(1)

        return

    def set(self, calctime=0):

        if not calctime:
            calctime = [timezone.now()]

        else:
            calctime = [calctime]

        v = 0

        for s in self.source.all():
            predata = []
            #Pre-seed a list for the data.
            for cps in s.chanprofsrc_set.filter(channel__id=self.pk):
                predata.append(cps.scale * cps.profile.intensity(calctime)[0])

            #Now, multiply all of the contributions for this source.
            srcdata = 1

            #If predata is empty (there are no profiles), 0 rundata.
            if not predata:
                srcdata = 0

            for r in predata:
                srcdata *= r

            #Add this channel's data to the data.
            v += srcdata

        #Make sure v is between 0 and maxIntensity:
        v = max(0.0001, min(v, self.maxIntensity))

        #Return the value
        return self.manualset(v)

    def manualset(self, v):
        if self.hwtype is 2:
            PWMctl.set_duty_cycle(self.hwid, 100 * v)

        return


    def calc(self, calctime=[timezone.now()], p=0):
        # Pre-fetch all of the data by channel.
        chandata = []
        name = [self.name]
        color = [self.traceColor]
        maxInt = self.maxIntensity

        now = timezone.now()

        for s in self.source.all():
            #Pre-seed a list for the data.
            predata = []
            for cps in s.chanprofsrc_set.filter(channel__id=self.pk):
                predata.append(
                    [i*cps.scale for i in cps.profile.intensity(calctime)]
                )

            #Now, multiply all of the contributions for this source.
            for i, v in enumerate(calctime):
                rundata = 1

                #If predata is empty (there are no profiles), 0 rundata.
                if not predata:
                    rundata = 0

                for r in predata:
                    rundata *= r[i]

                #If this is not the first time, add another element.
                if i:
                    srcdata.append(rundata)

                #If the first time, create a list
                else:
                    srcdata = [rundata]

            #Add this channel's data to the data list.
            chandata.append(srcdata)

        #Now, add up the data from all of the sources and tupilize it.
        for i, v in enumerate(calctime):
            t = (v - now).total_seconds() / 3600
            rundata = 0

            #Cycle through the data and add the source contributions.
            for c in chandata:
                rundata += c[i]

            #Make sure the data in between 0 and maxIntensity.
            rundata = max(0, min(rundata, maxInt))

            #If this is not the first time, add another element.
            if i:
                tupdata += ((t, rundata), )

            #If the first time, create the tuple.
            else:
                tupdata = ((t, rundata), )


        #If this didn't generate data, return blank data.
        if not tupdata:
            return {'name':['Blank'], 'data':[((0, 0), )], 'color':['ffffff']}

        #Return the generated data.
        return {'name':name, 'data':[tupdata], 'color':color}


class ChanProfSrc(models.Model):
    channel = models.ForeignKey(Channel)
    profile = models.ForeignKey(Profile)
    source = models.ForeignKey(Source)
    scale = models.FloatField(default=0)


    def calc(self, calctime=[timezone.now()]):
        scale = self.scale
        r = ()

        for i in self.profile.intensity(calctime):
            r += ((i[0], i[1]*scale), )

        return r

    def __unicode__(self):
        return self.channel.name + self.profile.name + self.source.name

    def __str__(self):
        return self.channel.name + self.profile.name + self.source.name
Now, here's the real question:

As part of the page, I generate some plots using reportlab by calling the different calc() methods in the code above, passing it a iterable of the times for which it wants data. The methods return the data in a list of specially formatted tuples (a tuple of 2-element tuples containing x, y data points) plus a little bit of formatting/naming info. If you look at the code, you will see that the methods end up iterating through a lot of loops, and it really isn't all that efficient. I'm looking for suggestions for how to make the algorithm run more efficiently.

A couple notes:

If you call calc() on a profile, I want it to return one set of data with the shape of the profile.
If you call calc() on a channel, I want it to return what the channel will really do:
Within each source, multiply all of the contributing scaled profiles (note: all are <1, so the multiplying has a reducing effect). Then, add up all of the contributions from each source.
If you call calc() on a source, I want it to return the contribution of the channel (I.E. the multiplication of the contributing scaled profiles), with multiple sets of data, one for each applicable channel.

Right now everything seems to be working *right* just not as *fast* as I'd like.

Re: Django Python Plotting: Speed Optimization

Posted: Fri Dec 12, 2014 11:08 pm
by fliptw
IIRC there are javascript frame works that draw graphs, it might be faster to send the data and let the client do the rendering.

you could also do the rendering of the plots to images on in a separate thread and cache the results.. I'm fairly sure you'd don't need 60fps rendering.

Re: Django Python Plotting: Speed Optimization

Posted: Sat Dec 13, 2014 10:19 pm
by Jeff250
Performing queries in loops is generally a bad idea for performance. Ideally, you want a constant number of queries per page, not a number that can grow based on the number of rows of something. See here for debugging the number of queries per page:

https://djangosnippets.org/snippets/93/

Trying to reduce the number of queries per page can be an art, but it often times involves accessing things from the opposite direction. For instance, something of the form...

Code: Select all

for c in self.channel_set.all():
    objset = c.chanprofsrc_set.filter(source=self)
    for cps in objset:
        ...
...can be changed to...

Code: Select all

cpss = ChanProfSrc.objects.filter(source=self).select_related('channel', 'profile').order_by('channel')
for channel, cpss_for_channel in itertools.groupby(cpss, lambda cps: cps.channel):
    for cps in cpss_for_channel:
        ...
This is untested code of course, so it might not work immediately out of the box. But instead of doing a query per loop, you're now just doing one query outside all of the loops. Of course, while the upside is speed, the downside is that the code is a bit uglier now.

edit:

For extra credit or if this query is still too slow, to create the index to optimize this query, put

Code: Select all

class ChanProfSrc(models.Model):
    class Meta:
        index_together = [('source', 'channel')]
Creating this index probably isn't necessary unless you're dealing with a really large number of rows though.

Re: Django Python Plotting: Speed Optimization

Posted: Wed Dec 17, 2014 4:24 pm
by snoopy
fliptw wrote:IIRC there are javascript frame works that draw graphs, it might be faster to send the data and let the client do the rendering.

you could also do the rendering of the plots to images on in a separate thread and cache the results.. I'm fairly sure you'd don't need 60fps rendering.
I'll take a look at a javascript method. I originally tried a javascript method and gave up on it because I couldn't get it to work. Since then, I found that I had a problem with the way that my static folder was set up, so I may be able to come back to a javascript solution now and have it work.

The rendering is threaded, so the page will show up and then it takes a second for the plots to appear - it's just annoying to have to wait for the brief time.

Along Jeff's lines of thought: I think it's more likely that it's actually the retrieval and formatting of the data that's taking the time, not the plotting itself, and if that's the case then making the client plot the data really won't help much. When I get back to wanting to optimize, I'll look into reducing the number of queries that I do per plot.