Coded with Wing IDE
A Python library for interacting with Stata .dta files using native Python types. PyDTA constructs Python lists with observations' variable data and objects with variables' types and labels.
This software is free and available under the MIT license.
- manipulate Stata datasets in your own Python programs
- convert datasets to new Python-supported file formats or relational databases (i.e. MySQL)
- perform calculations on the dataset using Python or SciPy
- multiple dataset accessors:
- implements Python generator ('for x in DTA.dataset(): print x')
- implements Python __getitem__ for dataset slicing ('print DTA')
- versions of Stata supported:
- Stata 10 (format-114 datasets )
- Stata 9 (format-113 datasets)
- supports all Stata string and numeric types:
- str, byte, int, long, float, and double are converted to native Python base types
- supports other fields:
- dataset label
- date/time dataset written (in Stata, not OS)
- variables' names, sort order, formats, labels, and value formats
- supports missing values 
- supports large datasets (direct I/O, no dataset pre-parser)
The current version discards value labels and Stata expansion fields. Stata deems its expansion fields unnecessary:
"Expansion fields are used to record information that is unique to Stata and has no equivalent in other data management packages. Expansion fields are always optional when writing data and, generally, programs reading Stata datasets will want to ignore the expansion fields." 
These choices were made to improve efficiency and could be reconsidered in a later version. The vast majority of users will not be affected.
PyDTA converts and includes observations with missing values in all dataset accessors. By default, missing values are returned as Python's "None". Users should be careful to ignore these observations in most scenarios.
Note: these examples lack important attributes of well designed software (i.e. error checking) and are presented only to demonstrate likely PyDTA usage syntax.
export to CSV
# simple CSV exporter import sys from PyDTA import Reader dta = Reader(file(sys.argv)) for observation in dta.dataset(): print ",".join(map(str,observation))
$ ./csv_export.py my_large_dataset.dta > my_large_dataset.csv
export to MySQL
import MySQLdb from PyDTA import Reader dta = Reader(file('input.dta')) fields = ','.join(['%s']*len(dta.variables())) cursor = MySQLdb.connect('localhost',db='test').cursor() for observation in dta.dataset(): try: cursor.execute('INSERT INTO test VALUES (%s)' % fields, map(str,observation)) except Exception: pass
$ svn checkout svn://presbrey.mit.edu/pylib/PyDTA
In late 2009, statsmodels (a statistical modelling package) picked up support for Stata using PyDTA.
(A commercial product exists called Stat/Transfer for users who prefer using a GUI. Stat/Transfer also includes a batch processor but is not extensible like Python.)
Deprecated: (presbrey) mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /afs/athena.mit.edu/user/p/r/presbrey/web_scripts/stat/index.php on line 63