Balkan Blogs Scripts

The scripts and raw log data in OpenOffice format

Before running these, you will need the following. I did all my work on Mac OS X, so I can't vouch for other platforms.

prep/prep.sh
This script generates the articles.xml and all-articles.xml files from the OpenOffice spreadsheet containing the data. These two XML files are used by all other scripts to provide data. The difference between them is that articles.xml contains only the data I analyzed; all-articles.xml contains all data, including broken links, excluded samples, etc. The prep directory contains other scripts used by prep.sh, but I won't go into detail on them. They will need to bechanged modify change the format of the spreadsheet (e.g. adding or removing columns).
source.xsc
This is the input log file, an OpenOffice spreadsheet.
lib/article.py
The article class used by some of the scripts
lib/mappings.py
A file describing mappings between recorded measures (e.g. frames B1, M1, M2) and analyzed frames (e.g., M2 collapse to M). It is used by prep.sh, so if you change the mappings be sure to re-run prep.sh.
work
All scripts in the work directory take either an XML file (in the format of articles.xml) or a matrix file (a tab-separated grid of values) as input. Unless noted otherwise, assume it's the former. The file can either be provided on stdin (e.g. through a UNIX pipe) or on the command line. Run a script with no arguments to see how to run it. Scripts expecting a field either take the field name (e.g. "frames") or the field name with a mapping (e.g. "frames@meta", which is the standard mapping I used for analysis).
work/summary.py
This produces a summary of field values. It takes one (optional) argument, which is the field to summarize. There are several types of summary: count, multi, occur, and calc, which perform different calculations or summaries. Most useful are occur, which count occurrences of a particular value, and multi, which count occurrences for fields with multiple values (such as frames).
work/occur.py
This creates a correspondence matrix of the two specified fields.
work/filter.py
This filters the input XML according to certain criteria. Only records matching those at least one criterion are passed through. Typical criteria are "link-type=sample" (include only primary samples), "frames@meta~G" (include only articles with a GOVERNMENT frame), "type!=media" (exclude media articles). This can be inverted to exclude articles by using the -n switch.
work/merge.py
This merges all of the secondary samples for each primary sample to produce a "merged" record. This makes it easy to perform comparisons between a primary article and all its links, for example. The types of secondary sample can be specified with the -l, -c, and -t flags.
work/dyads.py
This is like occur.py, except instead of mapping correspondences within records, it maps them between a primary sample and specified secondary sample types. Some of the calculation types haven't been implemented (mainly because they don't seem to make sense). I recommend using this only with merged data and the -m switch. The result is a tab-separated matrix.
work/graph.py
This can take the result of summary.py (using occur or multi) and generate and HTML bar graph. Great for presentations.
work/overlap.py
This is used to determine extra frames. It accepts a field, then generates new fields based on that field by comparing the values in the primary sample record with associated secondary sample records. For example, given frames@meta, it will generate frames-extra@meta, frames-overlap@meta, etc. These can then be analyzed using the other scripts. If you use it for something other than frames, you will have to modify lib/mappings.py in order to tell the scripts what type of the new field is.
freq.py
This turns a matrix of occurences into a matrix of percentages. It can also calculate totals etc.
hmtl.py
This converts an input matrix into a more attractive HTML page for viewing.
articles.xsl
This converts an articles XML file into an HTML page for viewing.

Some of the scripts are obsolete and I've forgotten what they do.