Aquileo | Recent changes to homehttps://sourceforge.net/p/stingrayreader/home/Recent changes to homeenThu, 17 Apr 2014 12:43:51 -0000Aquileo | Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<div class="markdown_content"><pre>--- v6 +++ v7 @@ -21,12 +21,15 @@ Additionally, Stringray provides some guidance on how to structure file-processing applications so that they are testable and composable. -Stingray 4.1 requires Python 3.3. +Stingray 4.3 requires Python 3.3. -It depends on three other projects: +It depends on one other projects to read legacy `.xls` files. * xlrd. http://www.lexicon.net/sjmachin/xlrd.htm -* PyLit. http://pylit.berlios.de/ + +In order to do a complete build from scratch, this is a literate programming example. You'll need these two tools + +* PyLit3. https://github.com/slott56/PyLit-3 * Sphinx. http://sphinx.pocoo.org/ Since Stingray is a *Literate Programming* project, the documentation is also the source. And vice-versa. </pre> </div>Steven F. LottThu, 17 Apr 2014 12:43:51 -0000https://sourceforge.net24e6a49bc44fd9867763fcbff48925bea76c18d4Aquileo | Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<div class="markdown_content"><pre>--- v5 +++ v6 @@ -6,20 +6,22 @@ ----------------- -Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely. Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files. +Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely. -By themselves, however, they aren't a very complete solution. +Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files. The ZipFile and XML modules help us parse almost everything else. By themselves, however, thes modules aren't a very complete solution. + +In particular, there's a lot of fumbling around trying to handle the schema for a spreadsheet. The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats. -1. It wraps ``csv``, ``xlrd``, plus several XML parsers into a single, unified "workbook" structure to make applications that work with any of the common physical formats. +1. It wraps ``csv``, ``xlrd``, plus several other parsers into a single, unified "workbook" structure. Applications can work with any of the common physical formats in a completely uniform way. + It extends the workbook to include fixed format files (with no delimiters) and even COBOL files in EBCDIC. -+ It provides a uniform way to load and use schema information. This can be header rows in the individual sheets of a workbook, or it can be separate schema information. ++ It provides a uniform way to load and use schema information. This can be header rows in the individual sheets of a workbook, or it can be separate schema information. It can also involve complex header parsing for those spreadsheets where someone had to create fancy column titles that include merged cells and other complications. + It provides a suite of data conversions that cover the most common cases. -Additionally, stringray provides some guidance on how to structure file-processing applications so that they are testable and composable. +Additionally, Stringray provides some guidance on how to structure file-processing applications so that they are testable and composable. -Stingray requires Python 2.7. +Stingray 4.1 requires Python 3.3. It depends on three other projects: </pre> </div>Steven F. LottSun, 30 Mar 2014 13:28:44 -0000https://sourceforge.net16fc4afe00ea0fec07168cac5ac9a66129a3de19Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<pre>--- v4 +++ v5 @@ -6,37 +6,25 @@ ----------------- -Stingray tackles three fundamental issues in -processing a file: - -- How are the bytes organized? - -- What do the bytes *mean*? - -- How can we assure ourselves that applications will work with this file? - -The problem we have is that the schema is not always bound -to a given file nor is the schema clearly bound to an application program. - -One goal of good software is to cope reasonably well with variability -of user-supplied inputs. Providing data by spreadsheet is -often the most desirable choice for users. In some cases, it's the -only acceptable choice. Since spreadsheets are tweaked manually, they -may not have a simple, fixed logical layout. - -A workbook (the container of individual "spread sheets") -can be encoded in any of a number of physical -formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications -to be independent of these physical formats. - -Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a given -schema. - -What has been done about it? - -What can we do in Python? - -How can we handle various kinds of spreadsheets transparently? - -Can we handle fixed-format files (those without delimiters)? If we can do that, -can we handle legacy COBOL files? Can we handle EBCDIC? +Spreadsheet format files are the *lingua franca* of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely. Python's ``csv`` module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files. + +By themselves, however, they aren't a very complete solution. + +The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats. + +1. It wraps ``csv``, ``xlrd``, plus several XML parsers into a single, unified "workbook" structure to make applications that work with any of the common physical formats. ++ It extends the workbook to include fixed format files (with no delimiters) and even COBOL files in EBCDIC. ++ It provides a uniform way to load and use schema information. This can be header rows in the individual sheets of a workbook, or it can be separate schema information. ++ It provides a suite of data conversions that cover the most common cases. + +Additionally, stringray provides some guidance on how to structure file-processing applications so that they are testable and composable. + +Stingray requires Python 2.7. + +It depends on three other projects: + +* xlrd. http://www.lexicon.net/sjmachin/xlrd.htm +* PyLit. http://pylit.berlios.de/ +* Sphinx. http://sphinx.pocoo.org/ + +Since Stingray is a *Literate Programming* project, the documentation is also the source. And vice-versa. </pre>Steven F. LottWed, 28 Sep 2011 13:39:16 -0000https://sourceforge.net0fb55291c2779265b5372c8c978f4d0d462e67d4Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<pre>--- v3 +++ v4 @@ -1,4 +1,4 @@ -Documentation: <http://stingrayreader.sourceforge.net/index.html> +HTML Documentation: <http://stingrayreader.sourceforge.net/index.html> Admins: [[project_admins]] </pre>Steven F. LottTue, 27 Sep 2011 23:11:29 -0000https://sourceforge.net82f728cb2e82d5c017ead76cf4837f93e9332c69Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<pre>--- v2 +++ v3 @@ -1,39 +1,42 @@ +Documentation: <http://stingrayreader.sourceforge.net/index.html> + +Admins: [[project_admins]] + +Download: [[download_button]] + +----------------- + Stingray tackles three fundamental issues in processing a file: - How are the bytes organized? - What do the bytes *mean*? - How can we assure ourselves that applications will work with this file? The problem we have is that the schema is not always bound to a given file nor is the schema clearly bound to an application program. One goal of good software is to cope reasonably well with variability of user-supplied inputs. Providing data by spreadsheet is often the most desirable choice for users. In some cases, it's the only acceptable choice. Since spreadsheets are tweaked manually, they may not have a simple, fixed logical layout. A workbook (the container of individual "spread sheets") can be encoded in any of a number of physical formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications to be independent of these physical formats. Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a given schema. What has been done about it? What can we do in Python? How can we handle various kinds of spreadsheets transparently? Can we handle fixed-format files (those without delimiters)? If we can do that, can we handle legacy COBOL files? Can we handle EBCDIC? - ----------- - -[[project_admins]] -[[download_button]] </pre>Steven F. LottTue, 27 Sep 2011 20:45:11 -0000https://sourceforge.net0fa35972d85c444777398bf94c698470fdb1c3a6Aquileo | WikiPage Stingray -- Schema-Based File Reader modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Stingray%2520--%2520Schema-Based%2520File%2520Reader/<pre>--- v1 +++ v2 @@ -1,5 +1,39 @@ -Welcome to your wiki! - -This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage]. - -The wiki uses [Markdown](/p/stingrayreader/home/markdown_syntax/) syntax. +Stingray tackles three fundamental issues in +processing a file: + +- How are the bytes organized? + +- What do the bytes *mean*? + +- How can we assure ourselves that applications will work with this file? + +The problem we have is that the schema is not always bound +to a given file nor is the schema clearly bound to an application program. + +One goal of good software is to cope reasonably well with variability +of user-supplied inputs. Providing data by spreadsheet is +often the most desirable choice for users. In some cases, it's the +only acceptable choice. Since spreadsheets are tweaked manually, they +may not have a simple, fixed logical layout. + +A workbook (the container of individual "spread sheets") +can be encoded in any of a number of physical +formats: XLS, CSV, XLSX, ODS to name a few. We would like our applications +to be independent of these physical formats. + +Data supplied in the form of a workbook can suffer from numerous data quality issues. We need to be assured that a file actually conforms to a given +schema. + +What has been done about it? + +What can we do in Python? + +How can we handle various kinds of spreadsheets transparently? + +Can we handle fixed-format files (those without delimiters)? If we can do that, +can we handle legacy COBOL files? Can we handle EBCDIC? + +---------- + +[[project_admins]] +[[download_button]] </pre>Steven F. LottFri, 23 Sep 2011 20:35:02 -0000https://sourceforge.net6475e1753e0a94c3ec1d1225529aaaafe4b7081fAquileo | WikiPage Home modified by Steven F. Lotthttps://sourceforge.net/p/stingrayreader/home/Home/Welcome to your wiki! This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage]. The wiki uses [Markdown](/p/stingrayreader/home/markdown_syntax/) syntax. Steven F. LottFri, 23 Sep 2011 20:19:18 -0000https://sourceforge.net0eb96f1ddebb054fd4915199e906fa002d54d4ec