Aquileo | Recent changes to home

Aquileo | Stingray -- Schema-Based File Reader modified by Steven F. Lott

Thu, 17 Apr 2014 12:43:51 -0000

--- v6
+++ v7
@@ -21,12 +21,15 @@

 Additionally, Stringray provides some guidance on how to structure file-processing applications so that they are testable and composable.

-Stingray 4.1 requires Python 3.3.
+Stingray 4.3 requires Python 3.3.

-It depends on three other projects:
+It depends on one other projects to read legacy `.xls` files.

 *   xlrd.  http://www.lexicon.net/sjmachin/xlrd.htm
-*   PyLit.  http://pylit.berlios.de/
+
+In order to do a complete build from scratch, this is a literate programming example. You'll need these two tools
+
+*   PyLit3.  https://github.com/slott56/PyLit-3
 *   Sphinx.  http://sphinx.pocoo.org/

 Since Stingray is a *Literate Programming* project, the documentation is also the source.  And vice-versa.

Aquileo | Stingray -- Schema-Based File Reader modified by Steven F. Lott

Sun, 30 Mar 2014 13:28:44 -0000

--- v5
+++ v6
@@ -6,20 +6,22 @@

 -----------------

-Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely.  Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files.
+Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely.  

-By themselves, however, they aren't a very complete solution.
+Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files. The ZipFile and XML modules help us parse almost everything else. By themselves, however, thes modules aren't a very complete solution.
+
+In particular, there's a lot of fumbling around trying to handle the schema for a spreadsheet.

 The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats.

-1.  It wraps ``csv``, ``xlrd``, plus several XML parsers into a single, unified "workbook" structure to make applications that work with any of the common physical formats.
+1.  It wraps ``csv``, ``xlrd``, plus several other parsers into a single, unified "workbook" structure. Applications can work with any of the common physical formats in a completely uniform way.
 +   It extends the workbook to include fixed format files (with no delimiters) and even COBOL files in EBCDIC.
-+   It provides a uniform way to load and use schema information.  This can be header rows in the individual sheets of a workbook, or it can be separate schema information.
++   It provides a uniform way to load and use schema information.  This can be header rows in the individual sheets of a workbook, or it can be separate schema information. It can also involve complex header parsing for those spreadsheets where someone had to create fancy column titles that include merged cells and other complications.
 +   It provides a suite of data conversions that cover the most common cases.

-Additionally, stringray provides some guidance on how to structure file-processing applications so that they are testable and composable.
+Additionally, Stringray provides some guidance on how to structure file-processing applications so that they are testable and composable.

-Stingray requires Python 2.7.  
+Stingray 4.1 requires Python 3.3.

 It depends on three other projects:

Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lott

Wed, 28 Sep 2011 13:39:16 -0000

--- v4 
+++ v5 
@@ -6,37 +6,25 @@
 
 -----------------
 
-Stingray tackles three fundamental issues in 
-processing a file: 
-
--   How are the bytes organized?  
-
--   What do the bytes *mean*?
-
--   How can we assure ourselves that applications will work with this file?
-
-The problem we have is that the schema is not always bound
-to a given file nor is the schema clearly bound to an application program.  
-
-One goal of good software is to cope reasonably well with variability
-of user-supplied inputs.  Providing data by spreadsheet is 
-often the most desirable choice for users.  In some cases, it's the
-only acceptable choice.  Since spreadsheets are tweaked manually, they
-may not have a simple, fixed logical layout. 
-
-A workbook (the container of individual "spread sheets")
-can be encoded in any of a number of physical
-formats: XLS, CSV, XLSX, ODS to name a few.  We would like our applications
-to be independent of these physical formats.
-
-Data supplied in the form of a workbook can suffer from numerous data quality issues.  We need to be assured that a file actually conforms to a given
-schema.
-    
-What has been done about it?
-
-What can we do in Python?
-
-How can we handle various kinds of spreadsheets transparently?
-
-Can we handle fixed-format files (those without delimiters)?  If we can do that, 
-can we handle legacy COBOL files?  Can we handle EBCDIC?
+Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely.  Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files.
+
+By themselves, however, they aren't a very complete solution.
+
+The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats.
+
+1.  It wraps ``csv``, ``xlrd``, plus several XML parsers into a single, unified "workbook" structure to make applications that work with any of the common physical formats.
++   It extends the workbook to include fixed format files (with no delimiters) and even COBOL files in EBCDIC.
++   It provides a uniform way to load and use schema information.  This can be header rows in the individual sheets of a workbook, or it can be separate schema information.
++   It provides a suite of data conversions that cover the most common cases.
+
+Additionally, stringray provides some guidance on how to structure file-processing applications so that they are testable and composable.
+
+Stingray requires Python 2.7.  
+
+It depends on three other projects:
+
+*   xlrd.  http://www.lexicon.net/sjmachin/xlrd.htm
+*   PyLit.  http://pylit.berlios.de/
+*   Sphinx.  http://sphinx.pocoo.org/
+
+Since Stingray is a *Literate Programming* project, the documentation is also the source.  And vice-versa.

Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lott

Tue, 27 Sep 2011 23:11:29 -0000

--- v3 
+++ v4 
@@ -1,4 +1,4 @@
-Documentation: 
+HTML Documentation: 
 
 Admins: [[project_admins]]

Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lott

Tue, 27 Sep 2011 20:45:11 -0000

--- v2 
+++ v3 
@@ -1,39 +1,42 @@
+Documentation: 
+
+Admins: [[project_admins]]
+
+Download:  [[download_button]]
+
+-----------------
+
 Stingray tackles three fundamental issues in 
 processing a file: 
 
 -   How are the bytes organized?  
 
 -   What do the bytes *mean*?
 
 -   How can we assure ourselves that applications will work with this file?
 
 The problem we have is that the schema is not always bound
 to a given file nor is the schema clearly bound to an application program.  
 
 One goal of good software is to cope reasonably well with variability
 of user-supplied inputs.  Providing data by spreadsheet is 
 often the most desirable choice for users.  In some cases, it's the
 only acceptable choice.  Since spreadsheets are tweaked manually, they
 may not have a simple, fixed logical layout. 
 
 A workbook (the container of individual "spread sheets")
 can be encoded in any of a number of physical
 formats: XLS, CSV, XLSX, ODS to name a few.  We would like our applications
 to be independent of these physical formats.
 
 Data supplied in the form of a workbook can suffer from numerous data quality issues.  We need to be assured that a file actually conforms to a given
 schema.
     
 What has been done about it?
 
 What can we do in Python?
 
 How can we handle various kinds of spreadsheets transparently?
 
 Can we handle fixed-format files (those without delimiters)?  If we can do that, 
 can we handle legacy COBOL files?  Can we handle EBCDIC?
-
-----------
-
-[[project_admins]]
-[[download_button]]

Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lott

Fri, 23 Sep 2011 20:35:02 -0000

--- v1 
+++ v2 
@@ -1,5 +1,39 @@
-Welcome to your wiki!
-
-This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage].
-
-The wiki uses [Markdown](/p/stingrayreader/home/markdown_syntax/) syntax.
+Stingray tackles three fundamental issues in 
+processing a file: 
+
+-   How are the bytes organized?  
+
+-   What do the bytes *mean*?
+
+-   How can we assure ourselves that applications will work with this file?
+
+The problem we have is that the schema is not always bound
+to a given file nor is the schema clearly bound to an application program.  
+
+One goal of good software is to cope reasonably well with variability
+of user-supplied inputs.  Providing data by spreadsheet is 
+often the most desirable choice for users.  In some cases, it's the
+only acceptable choice.  Since spreadsheets are tweaked manually, they
+may not have a simple, fixed logical layout. 
+
+A workbook (the container of individual "spread sheets")
+can be encoded in any of a number of physical
+formats: XLS, CSV, XLSX, ODS to name a few.  We would like our applications
+to be independent of these physical formats.
+
+Data supplied in the form of a workbook can suffer from numerous data quality issues.  We need to be assured that a file actually conforms to a given
+schema.
+    
+What has been done about it?
+
+What can we do in Python?
+
+How can we handle various kinds of spreadsheets transparently?
+
+Can we handle fixed-format files (those without delimiters)?  If we can do that, 
+can we handle legacy COBOL files?  Can we handle EBCDIC?
+
+----------
+
+[[project_admins]]
+[[download_button]]

Aquileo | WikiPage Home modified by Steven F. Lott

Fri, 23 Sep 2011 20:19:18 -0000

Welcome to your wiki! This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage]. The wiki uses [Markdown](/p/stingrayreader/home/markdown_syntax/) syntax.