Tuesday, June 26, 2012

Recipe #1: An indexable file class

Python Recipes are simple bits of my code you can use freely in your own programs. First in a series. Yes, I'm gonna have a lot of different series of posts on this blog.

Here's a file-like class that allows access to lines in the file using Python indexing or slicing notation without reading the whole file into memory. It's sort of a list/file hybrid since it does have methods for incrementally reading from the file, too.

class linefile(object):
   
    def __init__(self, path):
        self.file    = open(path)
        self.offsets = []

    def __enter__(self):
        return self

    def __exit__(self, *e):
        self.file.close()

    def readline(self):
        self.offsets.append(int(self.file.tell()))
        text = self.file.readline()
        if not text:
            self.offsets.pop()
        return text

    def readlines(self):
        for text in self:
            pass

        return self

    def next(self):
        return self.readline()

    def __iter__(self):
        text = self.readline()
        while text:
            yield text
            text = self.readline()

    def __getitem__(self, index):
        offset = self.offsets[index]
        tell   = self.file.tell()
        if type(index) is slice:

            if slice.step == 1:   # step is 1, this is fastest
                self.file.seek(offset[0])
                text = [self.file.readline() for line in offset]
            else:
                text = []
                for o in offset:
                    self.file.seek(o)
                    text.append(self.file.readline())
        else:
            self.file.seek(offset)
            text = self.file.readline()
        self.file.seek(tell)
        return text

    def __len__(self):
        return len(self.offsets)


Unlike a regular file, a linefile is read-only and text-only. Internally, the instance stores the offsets of each line that has been read. The lines themselves are retrieved from the file on demand; the instance does not store any lines. Positive offsets index from the beginning of the file and negative offsets index from the last line read from the file, which may not be the end of the file. (Index -1, then, is the line before the current one.) You can use the readlines() method to read all the lines and thereby note the offset of each line of the file; this will allow the entire file to be accessed by index or slice as though it were a list. The linefile object is iterable and is also a context manager (i.e, it supports the with statement).

No comments:

Post a Comment