PyDTA

A Python library for interacting with Stata .dta files using native Python types. PyDTA constructs Python lists with observations' variable data and objects with variables' types and labels.

Possible uses:

Contents

[edit] Features

[edit] Release Notes

[edit] Discarded Fields

The current version discards value labels and Stata expansion fields. Stata deems its expansion fields unnecessary:

"Expansion fields are used to record information that is unique to Stata and has no equivalent in other data management packages. Expansion fields are always optional when writing data and, generally, programs reading Stata datasets will want to ignore the expansion fields." [3]

These choices were made to improve efficiency and could be reconsidered in a later version. The vast majority of users will not be affected.

[edit] Missing Values

PyDTA converts and includes observations with missing values in all dataset accessors. By default, missing values are returned as Python's "None". Users should be careful to ignore these observations in most scenarios.

[edit] Examples

Note: these examples lack important attributes of well designed software (i.e. error checking) and are presented only to demonstrate likely PyDTA usage syntax.

[edit] export to CSV

csv_export.py:

# simple CSV exporter
import sys
from PyDTA import Reader
dta = Reader(file(sys.argv[1]))
for observation in dta.dataset():
    print ",".join(map(str,observation))
$ ./csv_export.py my_large_dataset.dta > my_large_dataset.csv

[edit] export to MySQL

import MySQLdb
from PyDTA import Reader
dta = Reader(file('input.dta'))
fields = ','.join(['%s']*len(dta.variables()))
cursor = MySQLdb.connect('localhost',db='test').cursor()
for observation in dta.dataset():
    try: cursor.execute('INSERT INTO test VALUES (%s)' % fields,
                        map(str,observation))
    except Exception: pass

[edit] Source

Subversion repository:

$ svn checkout svn://presbrey.mit.edu/pylib/PyDTA

[edit] About

I wrote this Summer '07 for two friends at the NBER, Henry Swift and Eric Zwick.

A commercial product exists called Stat/Transfer for users who prefer using a GUI. Stat/Transfer also includes a batch processor but is not extensible like Python.

This software is free and licensed under the GNU GPL.

Retrieved from "http://presbrey.mit.edu/PyDTA"

This page has been accessed 728 times. This page was last modified 23:20, 14 September 2007.