read_api.rst 1.3 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
  1. read_api
  2. ==========
  3. Read the content from file or directory to create ``Dataset``, Currently we provided serval functions that cover some scenarios.
  4. if you have new scenarios that is common to most of the users, you can post it on the offical github issues with detail descriptions.
  5. Also it is easy to implement your own read-related funtions.
  6. Important Functions
  7. -------------------
  8. read_jsonl
  9. ^^^^^^^^^^^^^^^^
  10. Read the contet from jsonl which may located on local machine or remote s3. if you want to know more about jsonl, please goto :doc:`../../additional_notes/glossary`
  11. .. code:: python
  12. # read jsonl from local machine
  13. datasets = read_jsonl("tt.jsonl", None)
  14. # read jsonl from remote s3
  15. datasets = read_jsonl("s3://bucket_1/tt.jsonl", s3_reader)
  16. read_local_pdfs
  17. ^^^^^^^^^^^^^^^^
  18. Read pdf from path or directory.
  19. .. code:: python
  20. # read pdf path
  21. datasets = read_local_pdfs("tt.pdf")
  22. # read pdfs under directory
  23. datasets = read_local_pdfs("pdfs/")
  24. read_local_images
  25. ^^^^^^^^^^^^^^^^^^^
  26. Read images from path or directory
  27. .. code:: python
  28. # read from image path
  29. datasets = read_local_images("tt.png")
  30. # read files from directory that endswith suffix in suffixes array
  31. datasets = read_local_images("images/", suffixes=["png", "jpg"])
  32. Check :doc:`../../api/read_api` for more details