Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

### Wikidata and Wiktionary language data extraction

**Scribe-Data** is a convenient command-line interface (CLI) for extracting and formatting language data from [Wikidata](https://www.wikidata.org/). Functionality includes allowing users to list, download, and manage language data directly from the terminal.
**Scribe-Data** is a command-line interface (CLI) for extracting and formatting language data from [Wikidata](https://www.wikidata.org/) and other supported sources. It helps users list, download, manage, convert, and filter language data directly from the terminal.

> [!NOTE]\
> The [contributing](#contributing) section has information for those interested, with the articles and presentations in [featured by](#featured-by) also being good resources for learning more about Scribe.
Expand All @@ -28,7 +28,7 @@ Scribe applications are available on [iOS](https://github.com/scribe-org/Scribe-

Check out Scribe's [architecture diagrams](https://github.com/scribe-org/Organization/blob/main/ARCHITECTURE.md) for an overview of the organization including our applications, services and processes. It depicts the projects that [Scribe](https://github.com/scribe-org) is developing as well as the relationships between them and the external systems with which they interact. Also check out the [Wikidata and Scribe Guide](https://github.com/scribe-org/Organization/blob/main/WIKIDATAGUIDE.md) for an overview of [Wikidata](https://www.wikidata.org/) and getting language data from it.

# Contents
## Contents

- [Process](#process)
- [Installation](#installation)
Expand All @@ -38,15 +38,15 @@ Check out Scribe's [architecture diagrams](https://github.com/scribe-org/Organiz
- [Environment Setup](#environment-setup)
- [Featured By](#featured-by)

# Process
## Process

The CLI commands defined within [scribe_data/cli](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/cli) and the notebooks within the various [scribe_data](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data) directories are used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) once they're active.

The main data update process triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/queries_all_data) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.

<sub><a href="#top">Back to top.</a></sub>

# Installation
## Installation

Scribe-Data is available for installation via [uv](https://docs.astral.sh/uv/) (recommended) or [pip](https://pypi.org/project/scribe-data/).

Expand Down Expand Up @@ -80,7 +80,7 @@ pip install -e .

<sub><a href="#top">Back to top.</a></sub>

# CLI Usage
## CLI Usage

Scribe-Data provides a command-line interface (CLI) for efficient interaction with its language data functionality. Please see the [usage guide](https://github.com/scribe-org/Scribe-Data/blob/main/USAGE.md) or the [official documentation](https://scribe-data.readthedocs.io/) for detailed instructions.

Expand All @@ -95,10 +95,15 @@ scribe-data [command] [arguments]

### Available Commands

- `list` (`l`): Enumerate available languages, data types and their combinations.
- `get` (`g`): Retrieve data from Wikidata for specified languages and data types.
- `total` (`t`): Display the total available data for given languages and data types.
- `convert` (`c`): Transform data returned by Scribe-Data into different file formats.
- `list` (`l`): List languages, data types and combinations of each that Scribe-Data can be used for.
- `get` (`g`): Get data from Wikidata and other sources for the given languages and data types.
- `total` (`t`): Check Wikidata for the total available data for the given languages and data types.
- `convert` (`c`): Convert data returned by Scribe-Data to different file types.
- `download` (`d`): Download Wikidata lexeme or Wiktionary dumps.
- `interactive` (`i`): Run in interactive mode.
- `export_contracts` (`ec`): Export Scribe-Data contracts to a local directory.
- `check_contracts` (`cc`): Check the data in a Scribe-Data export directory to see that all needed language data is included.
- `filter_data` (`fd`): Filter exported Scribe-Data data based on provided data contract values.

### Command Examples

Expand All @@ -108,9 +113,10 @@ scribe-data [command] [arguments]

```bash
# Commands used in the above GIF:
scribe-data list
scribe-data list --language
scribe-data list --data-type
scribe-data get --language English --data-type verbs -od ./scribe-data
scribe-data get --language English --data-type verbs --output-dir ./scribe-data
scribe-data total --language English
```

Expand All @@ -120,13 +126,13 @@ scribe-data total --language English

```bash
# Commands used in the above GIF:
scribe-data get -i
scribe-data total -i
scribe-data get --interactive
scribe-data total --interactive
```

<sub><a href="#top">Back to top.</a></sub>

# Data Contracts
## Data Contracts

[Wikidata](https://www.wikidata.org/) has lots of [language data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data) available, but not all of it is useful for all applications. In order to make the functionality of the Scribe-Data `get` requests as simple as possible, we made the decision to always return all data for the given languages and data types. Adding the ability to pass desired forms to the commands seemed cumbersome, and larger Scribe-Data requests should be parsing [Wikidata lexeme dumps](https://dumps.wikimedia.org/wikidatawiki/entities/) as the data source.

Expand Down Expand Up @@ -160,7 +166,7 @@ Updating contracts shouldn't be something that Scribe-Data users should have to

<sub><a href="#top">Back to top.</a></sub>

# Contributing
## Contributing

<a href="https://matrix.to/#/#scribe_community:matrix.org">
<img src="https://raw.githubusercontent.com/scribe-org/Organization/main/resources/images/logos/MatrixLogoGrey.png" width="175" alt="Public Matrix Chat" align="right">
Expand Down Expand Up @@ -200,7 +206,7 @@ Scribe does not accept direct edits to the grammar JSON files as they are source

<sub><a href="#top">Back to top.</a></sub>

# Environment Setup
## Environment Setup

> [!IMPORTANT]
>
Expand Down Expand Up @@ -288,7 +294,7 @@ See the [contribution guidelines](https://github.com/scribe-org/Scribe-Data/blob

<sub><a href="#top">Back to top.</a></sub>

# Featured By
## Featured By

Please see the [blog posts page on our website](https://scri.be/docs/about/blog-posts) for a list of articles on Scribe, and feel free to open a pull request to add one that you've written at [scribe-org/scri.be](https://github.com/scribe-org/scri.be)!

Expand Down Expand Up @@ -316,7 +322,7 @@ The following organizations have supported the development of Scribe projects th

<sub><a href="#top">Back to top.</a></sub>

# Powered By
## Powered By

### Contributors

Expand Down
184 changes: 94 additions & 90 deletions USAGE.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,75 @@
<a id="top"></a>

# Scribe-Data CLI Usage

Scribe-Data provides a command-line interface (CLI) for efficient interaction with its language data functionality.
Scribe-Data provides a command-line interface (CLI) for extracting language data from Wikidata and other sources.

## Basic Usage
## Contents

- [Installation](#installation)
- [Development Build](#development-build)
- [Basic Usage](#basic-usage)
- [Command Examples](#command-examples)
- [Additional Help](#additional-help)

## Installation

### Using uv (recommended)

```bash
uv pip install scribe-data
```

To utilize the Scribe-Data CLI, you can execute the following command in your terminal:
### Using pip

```bash
pip install scribe-data
```

# For a development build:
git clone https://github.com/scribe-org/Scribe-Data.git # or ideally your fork
<sub><a href="#top">Back to top.</a></sub>

## Development Build

```bash
git clone https://github.com/scribe-org/Scribe-Data.git # or your fork
cd Scribe-Data

# With uv (recommended)
uv sync --all-groups
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows

# Or with pip
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
pip install -e .
```

<sub><a href="#top">Back to top.</a></sub>

## Basic Usage

scribe-data -h # view the cli options
```bash
scribe-data -h
scribe-data [command] [arguments]
```

## Available Commands
### Available Commands

- `list` (`l`): Enumerate available languages, data types and their combinations.
- `get` (`g`): Retrieve data from Wikidata for specified languages and data types.
- `total` (`t`): Display the total available data for given languages and data types.
- `convert` (`c`): Transform data returned by Scribe-Data into different file formats.
- `list` (`l`): List the languages, data types, and combinations available in Scribe-Data.
- `get` (`g`): Get data from Wikidata and other sources for the selected languages and data types.
- `total` (`t`): Show the total available data for selected languages and data types.
- `convert` (`c`): Convert Scribe-Data output into different file types.
- `download` (`d`): Download Wikidata lexeme or Wiktionary dumps.
- `interactive` (`i`): Run Scribe-Data in interactive mode.
- `export_contracts` (`ec`): Export Scribe-Data contracts to a local directory.
- `check_contracts (`cc`): Check that an export directory contains the language data needed by the contracts.
- `filter_data` (`fd`): Filter exported Scribe-Data data based on contract values.

## Available Arguments
### Available Arguments

The following arguments can be passed to the Scribe-Data commands whenever sensible:
The following arguments can be passed to commands where applicable:

- `--language` (`-lang`): The language to run the command for.
- `--data-type` (`-dt`): The data type to run the command for.
Expand All @@ -36,107 +78,69 @@ The following arguments can be passed to the Scribe-Data commands whenever sensi
- `--output-type` (`-ot`): The file type that the command should output.
- `--outputs-per-entry` (`-ope`): How many outputs should be generated per data entry.
- `--all` (`-a`): Get all results from the command.
- `--interactive` (`-i`): Run in interactive mode where supported.

## Command Examples

### List Command

1. Display all available options:

```bash
scribe-data list # -a --all
```

2. Display available languages:
<sub><a href="#top">Back to top.</a></sub>

```bash
scribe-data list -lang # --language
```

3. Display available data types:

```bash
scribe-data list -dt # --data-type
```

### Total Command

1. Display total available data for a specific data type (e.g. nouns):

```bash
scribe-data total -dt nouns
```

2. Display total available data for a specific language (e.g. English):

```bash
scribe-data total -lang English
```

3. Display total available data for both language and data type (e.g. English nouns):

```bash
scribe-data total -lang English -dt nouns
```

### Get Command

1. Get all available languages and data types:
## Command Examples

```bash
scribe-data get -a # --all
```
### List

2. Get specific language and data type (e.g. German nouns):
```bash
scribe-data list
scribe-data list --language
scribe-data list --data-type
```

```bash
scribe-data get -lang German -dt nouns
```
### Total

### Convert Command
```bash
scribe-data total --data-type nouns
scribe-data total --language English
scribe-data total --language English --data-type nouns
```

1. Retrieve data for both language and data type (e.g. English nouns) in CSV format:
### Get

```bash
scribe-data get -lang english -dt verbs -od ./output_data -ot csv
```
```bash
scribe-data get --all
scribe-data get --language German --data-type nouns
```

2. Retrieve data for both language and data type (e.g. English nouns) in TSV format:
### Convert

```bash
scribe-data get -lang english -dt verbs -od ./output_data -ot tsv
```
```bash
scribe-data get --language English --data-type verbs --output-dir ./output_data --output-type csv

### Interactive Get Mode
scribe-data get --language English --data-type verbs --output-dir ./output_data --output-type tsv
```

The CLI also offers an interactive get mode, which can be initiated with the following command:
### Interactive Mode

```bash
scribe-data get -i # --interactive
scribe-data interactive
scribe-data get --interactive
scribe-data total --interactive
```

This mode guides users through the data retrieval process with a series of prompts:

1. Language selection: Users can choose from a list of available languages or select all.
2. Data type selection: Users can specify which types of data to get.
3. Output configuration: Users can set the file format, export directory, and overwrite preferences.
<sub><a href="#top">Back to top.</a></sub>

The interactive mode is particularly useful for users who prefer a guided approach or are exploring the available data options.
## Additional Help

## Additional Assistance

For more detailed information on each command and its options, append the `--help` flag:
For detailed information on any command, use:

```bash
scribe-data -h # --help
scribe-data -h
scribe-data [command] -h
```

The CLI also has functions to check the version and upgrade the package if necessary.
Version and upgrade commands are also available:

```bash
scribe-data -v # --version
scribe-data -u # --upgrade
scribe-data -v
scribe-data -u
```

For comprehensive usage instructions and examples, please refer to the [official documentation](https://scribe-data.readthedocs.io/).
For more information, see the [official documentation](https://scribe-data.readthedocs.io/).

<sub><a href="#top">Back to top.</a></sub>
Loading
Loading